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The extensive use of new-type tests for measuring academic 
achievement has not been accompanied by research into the reliability 
opr validity of the items constituting the examinations to a degree 
omparable to that devoted to intelligence and aptitude tests. Yet 
he scores attained in college examinations are freighted with grave 
meaning for 4 large number of students. Performance in college 
xaminations carries its own reward or punishment, as, for example, 
eccommendation for positions, for academic advancement and 
honors, or denial of access to continuation courses. It might be argued, 
herefore, that as much or more care should be devoted to the prep- 
aration of course examinations as to intelligence tests. Rarely, it is 
o be hoped, is the score on one intelligence test, or even two for that 
matter, taken as seriously as the score in a final examination or in the 
ombined scores of final and mid-quarter tests. But college instruc- 
ors have determined the advancement of students in specific fields 
bf study and have prevented, unfortunately, the pursuit of training 
by skilled students in markedly dissimilar and highly specialized fields 
Dy examinations that never have been calibrated for accuracy or 
reliability. No brief is made for the music student who cannot 
advance in his specialty within the college of his choice because of 
failure in psychology. The system of education permitting irrelevant 


, The writers acknowledge their indebtedness to the National Youth Adminis- 
ation for the services of Federal Aid Students in validating examinations given in 
he Department of Psychology at the University of Minnesota. 
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fields of knowledge to determine competency for instruction in other 
fields of endeavor is in error. Yet, an instructor who makes no 
inquiries into the validity and reliability of his examinations must 
share in the error, for students may be rated both too high and too 
low by hastily prepared and complacently given new-type tests. 
The widespread use of new-type tests constitutes a challenge to 
explore their internal structure in the interest of better representing 
the students’ grasp of knowledge. 

Lest it be assumed that insistence is upon longer and more fre- 
quently given new-type tests than is now the practice, the reader is 
referred to a recent publication.? Matters of length and frequency 
of administration of tests lend themselves to statistical demonstration. 
Rather, the present emphasis is upon item analysis, the extent to 
which individual items do what the total score does. It is assumed 
that items which are failed by all students are poor items, because 
they have not discriminated students concerning whom there is a 
settled conviction regarding differences in grasp of appropriate informa- 
tion. Similarly, low validity is present when an item fails to differ- 
entiate students whose total scores in the test differ markedly. It is 
contended that new-type tests have been cluttered with items con- 
tributing nothing but confusion to the interpretation of the total 
score. Items too frequently are passed by a greater percentage of 
students with lower grades in the test than are passed by students 
with higher grades. Perfect inversions would be preferable to the 
inconsistencies not rarely found. What deserves emphasis is the 
contention that with the greatest care given to test construction a 
discouragingly large proportion of items will not discriminate students 
receiving different examination grades. These matters have been 
neglected in the enthusiasm for quickly scored tests, and, it is to be 
feared, for numerical indices concerning whose meaning little thought 
has been given. Fortunately, it is possible to improve examinations 
by careful investigation into the examinations themselves. Certain 
generalizations concerning the relative validity of different kinds of 
items now seem to be substantiated. “In the interest of consistency 
of measurement, those classes of items which measure separately 
whatever the total score represents should be used in greater number 
than other kinds of items having less discriminating value. 

A demonstration of higher discriminative value for one kind of 
new-type question does not justify the elimination of other kinds of 
items with less differentiating power. The materials of a course 
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may lend themselves to expression in one kind of item more than 
in another. He who has labored with tests acquires the “feel” 
for single-choice, analogy, or single-word completion questions. 
There are times when phases of a course would be neglected if their 
representation depended upon one kind of question. Wisdom does 
dictate, seemingly, greater care in the use of items belonging to a 
class usually producing low validity and it supports the utilization to 
maximum frequency of that kind of question which discriminates 
best. 

Surprisingly little has been done to establish the comparative 
validity or reliability of objective questions in contrast with reliability 
of test scores. A few years ago expert opinion was couched as follows: 
“The measurement of the comparative validity of each type of 
objective question or each type of arrangement of these questions 
into tests is one of the most important problems in the field of objective 
testing. While one of the most important, it is one of the most poorly 
done. The difficulty of determining satisfactory criteria by which 
to measure validity and the number of rather narrow, unrelated 
studies are evidence for the previous statement.” (p. 165.)? More 
recently there have appeared several studies of techniques for deter- 
mining the validity or reliability of items, but actual comparisons 
of types of items are still rare. Two investigators,® have indicated a 
preference for wrong-word questions. Since their contrasts utilized 
thirty-four of these items only, and because these same thirty-four 
items now are included in the two hundred and thirty-seven forming 
a part of this study, it is not necessary to do more than indicate that 
extensive investigation shows the preference as deserving of slight 
generalization. In his evaluation of test items, Anderson! gives the 
percentage of questions eliminated from further use because they 
failed to meet his standards of differentiation. He states, “For true- 
false questions my percentage of elimination is 38.3; for multiple- 
choice questions thirty-six; for completion questions thirty; for 
matching questions thirty-two; for wrong-word questions twenty-nine; 
and for essay questions twenty-nine. It is obvious that a fair pro- 
portion of an examination, not subjected to item analysis, makes 
no contribution to the final score and is valueless.” (p. 119.) It is 
regrettable that the number of items evaluated is not given, but it 
needs to be stated that the comparisons quoted are incidental to 
the burden of the article.’ They do show the inutility of many test 
items and the superiority of certain kinds of items. 
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It might be argued, as Lindquist has done, (p. 475)* that differences 
in validity or reliability of types of tests are reflections of the skill or 
ingenuity of the test-builder rather than products intrinsically depend- 
ent upon the structure of questions. Although the contention cannot 
be refuted when tests are constructed by one individual, it is reasonably 
demonstrable that types of questions gain some of their value from 
other sources than the important one of the builder’s skill. If test- 
builders are more ingenious or skilful with certain kinds of questions, 
then we need to know which type of question commands skill from 
the largest number of workers. It is likely that the problem set the 
test-builder by the form of question itself has something to do with 
the outcome, or, in this connection, with the validity or reliability of 
tests. Lindquist’s criticism merely sharpens observation. If followed 
to a logical conclusion, his position leads to experimental nihilism. 
No conclusive study of reliability or of validity can exist because the 
personal equation is uncontrollable. Essentially this is Lindquist’s 
assertion for he states it is ‘unlikely that empirical studies will con- 
tribute very much to the better evaluation of the various types of 
objective exercises” (p. 475). There may be an escape, however, 
from the rigidity of argument. 

Does not the argument lose force when new-type tests are forged by 
many codperating workers, each one under constraint to do his best in 
producing a set number of items of each type? If under these condi- 
tions certain kinds of items enjoy greater reliability, then, either it 
seems admissible that lacking specific training graduate assistants 
possess greater skill for one kind of item, or, that the internal structure 
of certain types of items lends itself readily to mastery by test-builders. 
If differences in skill lead to better questions, when the contributors 
are meeting an assigned task, this is precisely what we wish to find out. 
New-type tests demand those items reflecting the greatest amount 
of skill or those exercises which will tap such skill. The measures of 
skill are not precision or beauty of phraseology. They are the more 
prosaic and more pragmatic measures of reliability or validity. 

Graduate students assigned the task of preparing a specified number 
of each kind of new-type questions are not governed by interest in 
establishing the superiority of one type over another. They are 
alert, however, to the appropriateness of certain types of questions 
for various materials and concepts comprising subject-matter. And, 
it must be admitted, they develop preferences for certain kinds of 
items. Graduate students are not adverse to saving time. They 
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form judgments about the ease of making out items and about the 
time required to fulfill each phase of an assignment. Yet they enter- 
tain no settled conviction that their order of preference determines the 
order of reliability. Their attitude is one of doing satisfactorily that 
number of items assigned to them. But when asked to vote about the 
ease of constructing questions, they do vote predominantly for that 
kind of question which empirically is shown most discriminative. 
They are most skilful in the easiest kind of question to prepare. 

Recently seventeen graduate assistants and instructors, all having 
had considerable experience in making out new-type questions, were 
asked to rate three kinds of questions in order of ease of construction. 
The order in which the classes were given to the raters was varied to 
control the influence of position. Single-choice recognition, single- 
word completion, and analogy questions were rated. Most difficult 
to prepare were the analogy questions, according to all seventeen of 
the raters, and easiest were the single-word completion questions with 
fifteen votes for first place. Later it will become evident that single- 
choice recognition questions, which ranked second, are the least 
valid or reliable, and that analogy questions occupy a middle position 
with respect to reliability. Undoubtedly, greater care is expended 
upon analogy questions, either to invite or avoid subtlety, but they 
are not thereby forged into the most discriminating class of items. 

In connection with the course in General Psychology, at the Univer- 
sity of Minnesota, up to the close of the academic year 1936, there had 
been prepared and filed four thousand, two hundred and two new-type 
questions. More than half of these items have been validated. 
Precisely what is meant by validation will be made clear shortly. 
The number of each kind of question available can best be indicated 
in tabular form. 

Table I indicates a preponderance of single-choice recognition 
questions. Such burdening of the files is not justifiable and must 
go down as a tribute to inertia or to an impetus whose origin has 
been lost but whose influence has operated against the accuracy of 
sampling instruments. Without substantial evidence in measure- 
ment, year after year, the demand was for more single-choice questions 
than for any other kind. Single-word completion questions were 
not introduced systematically into examinations until the year 1933. 
Previously resort had been to multiple-word completion items. 
Now the shorter and simpler completion form, because of high reli- 
ability, has usurped the place occupied by recognition items. The 


‘ 
t 


ae 
a 
| 


—= 
s em a 








- 
LO A ee A ER Tn 


en eee ee 





246 The Journal of Educational Psychology 


TasiLze I.—InventTory oF New-TyPE Questions FOR INTRODUCTORY PsycHOLOGYy 














Coursz 

Psychology 1* Psychology 2f 
Type of question Vali- a Vali- Lary Total 

wpoee WE cade 
Single-choice................-. 413 409 510 415 1747 
Sa Bia ids Cok acce Sands 339 250 179 267 1035 
Wrong-word answer............ 140 91 97 73 401 
Single-word completion......... 230 180 355 254 1019 
RL 4'stid a6 whake nse ee une 1122 930 1144 1009 4202 




















* Psychology 1 includes questions on the following topics: Introduction, 
anatomy and function of the nervous system, sensation, levels of reaction, motives, 
emotion, attention, conditioning, learning, and memory. 

t Psychology 2 includes questions on the following topics: Imagery, association, 
testimony, suggestion and hypnosis, will and action, personality, character- 
analysis, intelligence, mental tests, individual differences, correlation, and abnormal 
psychology. 


wrong-word answer questions represented an innovation not to 
persist beyond the tenure of office of their original sponsor, hence the 
relatively small percentage of these items in the files. That matching 
questions do not appear in the table is due to their too low frequency for 
comparative purposes. Had the results of investigation been applied 
earlier, the questions available would have been different both in 
character and examination value. This comment is intended to 
serve merely as a warning against substituting unsupported judgments 
for empirically controlled ones. 

Because the validity or reliability of items is dependent in part 
upon the care devoted to their construction, it seems essential to 
describe briefly the process of preparing them. Graduate assistants 
are assigned relatively small segments of textbook material, rarely 
more than two chapters and often less, or else they are assigned to 
lectures, and are given approximately one week’s notice to turn in a 
specified number of items. The number is never large. The final 
examination notice for Psychology 2, 1935, for example, called for 
four analogy, four single-choice, and siz single-word completion 
questions. Usually two assistants cover the same segment although 
they do not codperate with each other. Emphasis is put upon the 








OGY 


jon, 
‘ter- 


to 
the 
ing 
for 
lied 
| in 

to 
nts 


part 
| to 
ints 
rely 
1 to 
in a 
inal 
for 
tion 
ugh 
the 


The Comparative Validity of New-type Questions 247 


minimum number of questions called for and the assistants are told 
that the items ‘‘are expected to be ‘fool proof,’ clear, pointed, brief, 
and with answers not too obvious.” These items, the work of approxi- 


mately fifteen people, are then examined by a committee of four 


members, two of whom are lecturers in the course. Each member 
passes judgment upon every item and indicates by symbols his accept- 
ance or rejection. When items appear to have enough merit to invite 
modification, committee members codperate. Questions receiving 
majority approval are accepted and typed on cards. But not all 
accepted questions will get into examinations, for another committee, 
having the burden of organizing tests, will eliminate duplicate and 
overlapping questions. It must appear, then, that an item must 
hurdle many barriers to become a part of an examination. This 
point is important to the discussion of validity. After receiving 
approval of its maker, securing support from at least three members 
of a committee, and running the gauntlet of another committee which 
determines the number of items bearing upon each topic in the course, 
a question may turn out to have no discriminating value. 

During the past few years, it has been possible to determine the 
discriminating value of examination questions. Whether it is correct 
or not to write of validity or of reliability cannot now be decided. 
There is a considerable literature bearing upon this matter, reference 
to which may be found in the summary of Lee and Symonds’ and 
in a number of an educational research journal devoted entirely to 
educational tests.* It is necessary, however, to point out what is 
meant by validity in this study. 

After examinations have been corrected, it is usually the custom 
to assign grades to arbitrarily determined ranges of scores. In this 
case, grades ranging from A through F represent highest to lowest 
scores. The validity of a test item is found by computing the per- 
centage of students from each letter-grade group (A, B, C, D, and F) 
that correctly answers it./ To obtain these validations, tests are sorted 
into five piles according to the letter grade. Since papers having 
A, B, and C grades show more accuracies than errors, the errors are 
tabulated. With a knowledge of the number of A papers, for example, 
it is easy to secure the number of correct answers for each question 
by a process of subtraction. Conversely, since D and F grade papers 
contain more errors than accuracies, the number of correct answers 
to each item is tabulated. With the number of correct and incorrect 
answers made by each letter-grade group, the task of converting these 


oped aes ort 
a 


a ca 
——- ) oer ee oY 


pee 


a ee 
4. Bein tt re 


pe = - 


= 
a 


Pe. 


2 Fer, 


SS at SESE AER te Nas 


~ a 


* 
— 
-_.  — =_- 


i 
: 








ra. 
| 
| 





F ‘ » ed 
et? eet ad soos a a 
¢ antes ras. afi ~ ale 
* ee ——oe = — « - fe & 


SP en Sey 


' ’ 
we 
¢ 

: 


-_ 
Se ee me te et —d 


en anaes aS 
wen aie RR Be eee our prea ie men ae 





248 The Journal of Educational Psychology 


numbers into percentages is a one-step process. It has been customary 
to indicate the percentage of each letter-grade group passing an item 
on the back of the card on which the question is recorded. 

If an instructor desires his questions to distinguish consistently 
groups of students who traditionally must be separated into five 
categories, then those items accomplishing the distinction are valid 
items. Perhaps the boldness of the assertion invites consideration 
of the function of examinations—an issue worthy of consideration 
but not a part of this investigation. Examinations are at present 
primarily achievement tests; measures of what has been done; means 
for accomplishing grade separation. They are rarely incitements to 
correct misinformation or to tap motivation in the interest of better 
learning. Questions, then, which secure grade separation are valid 
questions. But by the same token they are reliable items. Each 
is accomplishing that which is represented by the total score. 

A five-step system of validation is a severe criterion which puts 
a premium upon quality. The fewer divisions, the fewer the items 
failing to discriminate perfectly; the more divisions, the greater is the 
possibility that items will show letter-grade displacements. Since the 
additional labor needed to secure five categories rather than three is 
slight, and since prevailing academic practices make use of five letter- 
grades or their equivalent, it was deemed practical to reflect the grading 
system. 

Use of the total score in a specific examination, rather than the 
final grade in a course, was taken as a standard for validating items. 
This is the practice prevailing wherever achievement tests are vali- 
dated. In using the grade in a particular test, an instructor is holding 
constant such factors as attitude, time, and the conditions under 
which the test was given. Had the final grade been used, it would 
have reflected factors never given a chance to operate in any one test. 
It is questionable if irrelevant conditions should be allowed to deter- 
mine the validity of items intended to discriminate knowledge at 
one rather than at another phase of a course. Repetition of items, 
with new validations, permits the formation of a judgment es 
mastery of important concepts by students. 

Table I shows that more than twenty-two hundred items, falling 
within limits of four types of questions, have been validated. The 
omission of the number of students taking examinations is an oversight 
not permitting correction. When validations were recorded on cards, 
the number of students in each grade range was omitted. Practical 
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rather than research demands were basic to the validations. The 
omission is neither fatal to the value of the information for making 
out new examinations nor to the forming of generalizations dealing 
with the relative discriminating power of types of questions. It 
does handicap statistical delicacy. Large numbers of students, 
usually more than five hundred, took each examination. Since letter 
grades closely approximate the following proportions from A to F 
grades, namely, percentages of 10, 15, 48, 15, and 12, it is easy to 
estimate with fair accuracy the number of students represented when 
attention must be paid to the letter-grade categories. There are a 
sufficient number of students represented in each category to justify 
confidence in the data. 

Questions showing decreasing percentages of each group from the A 
grade through the F grade may be said to discriminate perfectly. 
They may not all possess marked discriminating capacity, for some 
questions separate letter-grade groups by slight amounts. Such 
questions can be expected upon repetition to vary so that some will 
be removed from the class of perfect discrimination. The stability 
of items, however, will be considered separately in a succeeding 
publication. Concern is now with the problem of difficulty of various 
kinds of items. Under the category of perfect discrimination are 
included items which clearly separate each group of students from 
its neighbors and those which barely do so, as well as items passed 
by the majority or by the minority of each letter-grade group. One 
question, for example, may show the following percentage of each 
group passing it, A 98, B 93, C 90, D 88, and F 80. It is an easy 
question and one likely to show reversals upon repetition. Another 
item shows good discrimination and yet is difficult since the percentage 
of each group passing is A 55, B 40, C 32, D 18, and F 6; whereas still 
another meets the criterion excellently as follows: A 91, B 82, C 66, 
D 48, and F 37. Perfect discrimination means, simply, that decreasing 
percentages of students from A grade to F grade pass the items. 

When questions are passed by a higher percentage of a lower grade 
group than the immediately higher grade group, for example, a higher 
percentage of B students than of A students pass an item, they are 
put in the category of one-letter grade displacement. These questions, 
in the majority of cases, are useful to the examination. As will 
be emphasized later, many fail to separate two groups of students by 
slight margin although they distinguish adequately all remaining 
divisions. 
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If more than one inversion occurs, then the criterion of validity 
used is whether or not the question differentiates the combined A-B 
grade students from the D-F students. Two illustrations will aid 
definition. The percentages of each letter-grade group passing items 
were 42, 44, 30, 36, 28, and 80, 80, 72, 72, 68. It is evident that a 
question differentiating A-B from D-F students need not have marked 
validity. The large middle group may excel the A-B group, or the 
F students may do better than the D and even the C grade groups 
and yet the A-B category can be superior to the D-F one. Unless the 
degree of separation is extreme, he who has observed the inconsistencies 


Taste II.—Anatysis OF VALIDATIONS OF Four NEw-TyPE QUESTIONS 














Type of question 
. Wrong- Single- 
Single- 
Degree of discrimination choice Analogy word word 
answer | completion 
No. wee No. her No. ver No. ow 
cent cent cent cent 
Psychology 1. 
Perfect discrimination....... 207 | 50.1) 187 | 55.1) 73 | 52.1) 165 | 71.7 
One letter-grade displacement} 128 | 31.0} 102 | 30.1) 40 | 28.6; 48 | 20.9 
A-B from D-F students...... 43 | 10.44 26| 7.7) 12] 8.6) 6] 2.6 
RR ea RR oy ae 35} 8.5) 24] 7.1) 15] 10.7] 11] 4.8 
Psychology 2. 
Perfect discrimination....... 191 | 37.4, 80 | 44.6) 43 | 44.3) 219 | 61.7 
One letter-grade displacement] 154 | 30.2) 44 | 24.6) 22 | 22.7) 78 | 22.0 
A-B from D-F students...... 60 | 11.8; 20; 11.2} 10] 10.3) 28] 7.9 
Cadac ce ibis ebb 4-4 80 105 | 20.6; 35 | 19.6) 22 | 22.7; 30] 8.4 





























possible can entertain no confidence in questions incapable of finer 
discrimination. An item can be poor indeed and yet separate the 
highest and lowest quarters of a class. 

Questions may turn out to have no value for separating levels of 
academic achievement. The following three samples, expressed in 
percentages, cover the range of difficulty from easy to hard questions 
without any one of them serving validation: (1) A 98, B 93, C 94, 
D 95, and F 96; (2) A 56, B 53, C 52, D 64, and F 67; and (8) A 25, 
B 23, C 25, D 23, and F 35. Items having no power of discrimination 
creep into examinations with surprising frequency in spite of vigilance. 
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Occasionally, yet rarely enough not to justify a new category, one 
finds items showing perfect inverse discrimination. Such a question 
was the following: ‘‘Red + black = (1., richer color-tone; 2., decrease 
in saturation; 3., increase in chroma; 4., a blend).” It was answered 
correctly by these percentages of students, A 33, B 34, C 38, D 42, and 
F 56. The discriminating values for all items analysed are presented 
in Table II. 

The outstanding result of validating is the striking superiority of 
the single-word completion or recall type of question. This superiority 
holds for both halves of the course and it is shown not only by items 
which discriminate perfectly but also by the infrequency with which 
these items fail to discriminate. Questions considered by graduate 
students as easiest and least time-consuming to make out also appear 
to serve best the separation of students into grade sequences. Rarely 
have these questions contributed a fourth of the examination content, 
most often less and only once more. Yet they have accomplished 
separately whatever is represented by the total examination score 
to a degree far in excess of the single-choice items which have been 
used liberally. The completion questions would have been more 
discriminating with a numerical representation equal to that of single- 
choice items. 

There are no striking differences among the three types of recogni- 
tion questions. In view of the consistency manifested, the single- 
choice recognition question contributes least to grade separation, 
and the analogy questions are slightly better than the wrong-word 
answer questions. The similarities, however, far outweigh the differ- 
ences. They emphasize again the likelihood of wishful thinking 
leading the maker of tests astray. Often analogy questions are 
lauded as instruments eliciting thinking, the implication being that 
the student does the thinking. This bit of projection has its humorous 
touch. The test-builder knows that he labors long and hard to 
satisfy the demands of analogies, and he assumes that students’ 
complaints, following periods of wrinkled brows, bear testimony to 
the excellency of these questions. If examinations are exercises in 
thinking, it might be possible to show that analogies tap thought 
processes better than all other questions. They do not thereby 
achieve better separation of students in terms of whatever our new- 
type tests measure. Often one wonders if the exercise of thinking 
is not divorced from course content by the subtle expressions necessary 
to hide the kernel of learning. At least single-word completion 
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questions more often convince the student of his academic nakedness 
and of the desirability of having a few concepts within which to do 
his thinking. Wrong-word answer questions have not fulfilled their 
earliest promise.’ They yield a slightly larger percentage of good and 
of poor questions than is characteristic of single-choice items! 

Somewhat challenging contrasts appear between examinations 
given in the two halves of the course. All four kinds of questions 
are less valid in Psychology 2. Perhaps this result is one to be 
expected because the lowest ten per cent of the students have been 
eliminated by the end of the first half of the course and before the 
final examination in the second half this number has been augmented. 
Homogeneity of the group may carry the full burden of explanation. 
The greater refinement of questions demanded by more selected 
students is not matched by the effects of practice in making out tests. 
Because total scores in both phases of the course examinations were 
distributed normally and over equally wide ranges, and because of the 
care devoted at all times to the preparation of items, the inequality 
in validity of questions actually was not realized. True, members 
of the committees felt that the task of making out items and of 
selecting them was made more difficult by the less rigid, less specific, 
or more conceptual content of Psychology 2. Yet items finally 
accepted for inclusion in tests received endorsement not less enthu- 
siastic than that prevailing in Psychology 1. The discrepancy empha- 
sizes the need for validating items. 

Each kind of question occupies the same position in the hierarchy of 
perfect discrimination, but single-choice items seem to suffer the 
greatest decrement. Single-word completion questions enjoy a 
superiority even greater than that of any other kind during the more 
favored half of the course. The most significant contrast exists 
among questions having no discriminating value. Of the three 
kinds of recognition questions, one out of every five items is without 
value; of completion questions, one out of every twelve. Certainly 
the builders of these examinations never entertained the conviction 
that so many items could be useless after careful preparation. Perhaps 
too great care is its own defeat. 

Inspection of Table II shows that from one-fifth to approximately 
one-third of all questions are one letter-grade displacement items. 
These items are useful because they contribute to the separation of 
most letter-grade groups and became many of them fail to discriminate 
perfectly by only slight margins. Stress should again be put upon the 
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fact that only percentages of letter-grade groups passing each item 
were available; consequently the organization of differences in per- 
centage into tabular form merely emphasizes the borderline value of 
certain questions. If four per cent more D-grade students than 
C-grade students pass a question, there are fewer students included 
in the overlapping than when four per cent more C-grade students 
than B-grade students pass the question. In spite of errors inherent 
in combining unequal units of measurement, the process does serve 
to indicate the utility of many questions belonging to the four types. 
Table III represents the analysis of grade displacement questions. 


Tas_e IT].—ANALyYsis OF Questions SHOWING ONE LETTER-GRADE 











DIsPLACEMENT 
Single-choice| Analogy | W*ns-word | Single-word 
: answer completion 
Amount of displace- 
ment in percentage No. Per No. Per No. Per No. Per 
cent cent cent cent 
Psychology 1. 
QO- 4 88 | 68.7| 72 | 70.5| 29 | 72.5| 36 | 75.0 
5- 9 28 | 21.9; 21 | 20.6 8 | 20.0 11 | 22.9 
10-14 8 6.3 7 6.9 3 7.5 1 2.1 
15-19 4 3.1 1 1.0 0 0.0 0 0.0 
20-24 0 0.0 1 1.0 0 0.0 0 0.0 
Psychology 2. 
0O- 4 87 | 56.5] 27 | 61.4 9 |40.9| 50 | 64.1 
5- 9 46 | 29.9 10 | 22.7 7 | 31.8 17 | 21.8 
10-14 ll 7.1 2 4.5 4 18.2 9 11.5 
15-19 8 5.2 3 6.8 2 9.1 2 2.6 
20-24 2 1.3 2 4.5 0 0.0 0 0.0 





























There is no certain criterion for selecting the best of the large 
group of displacement questions. In actual grading practice, many 
C-grade students are indistinguishable from B-grade students. Bound- 
ary lines are tenuous. But we cannot tell whether students with 
the highest scores in their respective letter-grade divisions contributed 
chiefly to these displacements. The hypothesis that they did so is 
amenable to future investigation. It is true that many of these 
items yield perfect discrimination upon repetition with the same or 
different classes of students. A severe standard of acceptance might 
be that one which rejects all items showing five per cent or more of a 
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lower letter-grade group passing the item with greater success than 
the higher grade group. This would add a considerable number of 
questions to those of certain utility. 

The contrasts among displacement items are not unlike those 
existing for questions which discriminated perfectly. In spite of the 
rigid selection through which displacement items are obtained, a 
selection which accounts for the similarities among the types of 
questions, there is a definite tendency for questions in Psychology 2 
to exhibit greater overlapping of correct answers among letter-grade 
groups. In this respect, wrong-word answer questions exhibit the 
largest discrepancy. It has been assumed that the greater homo- 
geneity of the students in Psychology 2 is responsible for these 
differences. 

Possibly an analysis of the kinds of students contributing the 
displacement questions will help in accounting for a persistent differ- 
ence in test items from one half of a course to another. Table IV 
has been prepared to test the hypothesis of homogeneity. 


Tasie [V.—LocatTion oF ONE-LETTER DISPLACEMENT ITEMS 














Type of question 
. ; : Wrong-word | Single-word 
Questions passed by | Single-choice Analogy aia contplotion 
more 
No. Per No. Per No. Per No. Per 
cent cent cent cent 
Psychology 1. 
Bthan A students..| 48 | 37.5| 38 | 37.2 16 | 40.0 17 | 35.4 
C than B students..| 17 | 13.3} 19 | 18.7 3 7.5 5 | 10.4 
Dthan C students..| 16 12.5 14 13.7 5 12.5 3 6.3 
F than Dstudents..| 47 | 36.7/| 31 30.4 16 | 40.0 | 23 | 47.9 
Psychology 2. 
B than A students..| 49 | 31.8 8 | 18.2 5 | 22.8; 18 | 23.0 
C than B students..| 11 7.1 4 9.1 2 9.1 14 | 18.0 
D than C students..| 25 | 16.2 14 | 31.8 3 13.6 | 22 | 28.2 
F than D students..|; 69 | 44.9| 18 | 40.9 12 | 54.5] 24 | 30.8 





























If attention is centered upon the lowest grade groups, then it does 
appear that the greater homogeneity of the group is an important 
cause of the lower validity of items. In the second half of the course, 
relatively more students with examination grades of D and F overlap 
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their higher grade groups, respectively. The elimination of the 
lowest ten per cent of the class during the first half has tended to 
equalize accomplishment and perhaps ability. When the highest 
grade group is considered, the contrasts still are favorable to the 
hypothesis of homogeneity. The examinations, which include the 
items under review, follow at intervals of approximately one month 
and during the course there are six of them. It seems reasonable to 
assume that early in the course there will be capable students who 
have not reached their academic stride. They may be potentially A 
students but their first examination mark is, perhaps, B grade. At 
the beginning of Psychology 2 these students have more closely 
approximated their maximum relative achievement. They have 
been awarded a course grade of A or B. Their relative positions with 
respect to each other can be expected to be more stabilized than 
during the first part of the course. Fewer examination items, there- 
fore, should be passed by more B than A students. Inspection of 
these two categories in Table IV indicates that each type of question 
shows markedly fewer overlappings in the second half of the course. 
Other factors may operate to produce the differences largely attributed 
to the homogeneity of the group, but as yet they cannot be subjected 
to analysis. 

Of more than passing interest is the observation that although the 
C-grade group of students is by far the largest, it contributes relatively 
few students to the process of letter-grade displacement. This point 
mollifies the objection to using percentages for practical consideration 
in this particular study. Of greater import is the light thrown upon 
letter-grade differentiations. A rather large number of items in 
our new-type tests are more within the grasp of the lowest grade 
students than of higher grade students. One is left with the query, 
“What does a letter-grade mean?” 

As a final contrast, it may be of value to assemble certain data into 
new categories. If to the questions discriminating perfectly are added 
those failing to do so by an amount of overlapping less than five per 
cent of any letter-grade group, it may be conceded that all of these 
items discriminate excellently. The process must of necessity penalize, 
by comparison, the completion questions, but the more comprehensive 
grasp of the value of new-type tests is the objective sought. A 
second class of questions, of doubtful discriminative value, would be 
made up of one letter-grade displacement questions showing over- 
lapping of greater than five per cent; and the third division would 
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comprise all items capable of separating only A-B groups from D-F a 
\ groups of students together with those unquestionably useless. This " 
) summary is represented in Table V. per 
Taste V.—A Summary or VatipaTions ror Four New-ryre QuEsTIONs pn 
e 
Type of question er 
: : Wrong-word | Single-word 
Ph roel Single-choice Analogy pod ciaidalien ye: 
No. | ** | no. | ** | wo. | | ae, | Be 
cent cent cent cent 
hav 
Ee 573 | 62.0 | 366 | 70.5 154 | 65.0 | 470 | 80.3 the 
{ SIED iia'd v'esd eb id 107 | 11.6 47 | 9.1 24 | 10.1 40 | 6.8 wit 
ay i RR ctendinnnudn¢ > cei 243 | 26.3 | 105| 20.3] 59/24.9/ 75] 12.8 
+ Ribot ay 923|....| 518] .... | 237] .... | 585 =— 
' ser’ 
is can 
i. SUMMARY AND CONCLUSIONS disc 
‘. A study of the comparative validity of two thousand two hundred 
‘ sixty-three new-type questions, belonging in the classes of single re 
choice, analogy, wrong-word answer, and single-word completion os 
| questions has been presented. oie 
The technique of item analysis used was probably as severe a test 
of the value of items as exists. Each question was examined for its ba 
) value in discriminating students according to the letter-grade dis- ie 
tribution prevailing in academic practice. The objective was to cod 
discover whether or not items belonging to different classes of new- ven 
type questions would be passed by decreasing percentages of students in 4 
classified by the total examination score as A-, B-, C-, D-, and F-grade the 
R students. que 
Analyses of the validations, according to three criteria—namely, bt 
perfect discrimination, one letter-grade displacement, and the con- pear 


bined ratings of items which differentiate A-B from D-F students and 

a. those items failing absolutely to separate letter-grade categories— 

show the most valid kind of question to be the single-word completion Ls 

form. Between sixty-two and seventy-two per cent of the single-word 

completion questions satisfy the criterion of perfect discrimination, 

whereas only thirty-seven to fifty per cent of the single-choice; forty- 2. J 
J five to fifty-five per cent of the analogy; and forty-four to fifty-two per 
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cent of the wrong-word answer questions have such discriminative 
ability. 

Although given equal care in construction and selection, twenty-six 
per cent of the single-choice, twenty per cent of the analogy, and 
twenty-five per cent of the wrong-word answer questions are com- 
pletely invalid, whereas only thirteen per cent of the single-word 
completion questions lack discriminative character. 

When inversions occur—that is when a lower letter-grade group 
excels a higher letter-grade group in ‘specific questions—they appear 
most frequently in the F-D grades and at the A-B level and least often 
do C-grade students overlap the B-grade level. 

It is not proper to assume that because questions of certain kinds 
have been shown to have a designated quality of discrimination during 
the first half of a course that they will not suffer decrement as students 
with lower academic achievement are eliminated. The more homo- 
geneous the group, the less will different kinds of new-type questions 
serve the end of grade separation. Single-word completion questions 
can be expected to resist the negative effects of homogeneity upon 
discrimination to the most satisfactory extent. 

If all criteria available in this investigation are considered, the 
single-word completion (recall) type of question ranks highest in 
discriminating, and, among recognition types analogy, wrong-word 
answer, and single-choice questions follow in order with no great 
advantage belonging to any one. 

Empirical studies of validity or reliability of items where the 
ingenuity or skill of test-builders does not weigh too heavily to forbid 
contrasts among types of questions, have been few in number. The 
codperative enterprise of many workers reduces the influence of the 
personal equation to a minimum so that the comparisons yielded 
in this study should have generalized meaning to instructors seeking 
the most diagnostic form of question. Predilection for one type of 
question rather than for another is likely to be useless when supported 
merely by logical considerations. Empirical studies can do much to 
enhance the usefulness of new-type examinations. 
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A STUDY OF CERTAIN FEARS AND WISHES AMONG 
DEAF AND HEARING CHILDREN* 


RUDOLF PINTNER AND LILY BRUNSCHWIG 
Teachers College, Columbia University 


THE PROBLEM 


Studies of the self-expressed fears and wishes of children are of 
value for the insight they may afford into the nature and level of 
emotional adjustments. In the present investigation certain self- 
reported fears and wishes of deaf and hearing children are studied and 
compared. 

The specific questions to be considered are as follows:A1) What 
is the relationship of scores on a test of fears and wishes with age and 
some other factors? (2) How do deaf and hearing children compare 
with respect to their expression of certain fears? (3) What differences 
are found between the expressed wishes of deaf and hearing children, 
when seven sets of wishes are presented, each so arranged as to require 
choosing between a desire for the immediate fulfillment of a smaller 
good and a desire for a delayed, but greater good? 


THE TESTS 


The measures devised for this study consist of a brief paper and 
pencil group test of fears and another of wishes. The use of short and 
simply worded statements was necessitated by the restricted language 
comprehension so characteristic of young deaf children. 

The fears test was composed of thirty-nine words and short phrases, 
and was presented to pupils in mimeographed form with the following 
instructions: ‘Look at all the words on this page and make a mark 
in front of each thing that makes you afraid.”” The score constituted 
the total number of words checked by each subject. 

The thirty-nine items of the fears list had been gathered for the 
most part from frequently occurring responses among more than 
one hundred fifth- and sixth-grade normal hearing public elementary- 
school children to the completion question, ‘I am afraid when ——.” 
This statement had formed part of a data sheet administered to a 
large population in connection with a battery of tests. Several 





" Prepared with the assistance of U. 8S. Works Progress Administration, New 
York City, Project 165-97-6999-6041-1025. 
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relatively neutral stimulus words, such as, “Books,” “Girls,” and 
“‘Home,” were added to the fears list for padding. 

The fears list was administered at one sitting with the seven pairs 
of wishes. The latter were suggested by the three sets of wishes 
employed by Washburne* in his study concerned with measuring 
children’s ability to sacrifice an immediate satisfaction for a greater 
future satisfaction. The total number of wishes expressed by each 
child for immediate rather than for delayed satisfaction was taken 
as the score on the wishes test. 

The items included in the fears test and the list of wishes are shown 
in Tables V and VI, respectively. 


THE SUBJECTS 


The experimental population consisted of eighty-five deaf boys and 
seventy-four deaf girls from one large public day school for the deaf in 
New York City, and a second smaller suburban New Jersey public 
day school for the deaf. The hearing control group was composed of 
one hundred sixty-eight boys and one hundred seventy-seven girls 
in grades V to VIII inclusive from public elementary schools located 
in the various boroughs of New York City. 

Information with regard to the mean age of the groups and the 
auditory status of the deaf are shown in Table I. Deaf boys are on 
an average two years older than hearing boys, deaf girls are three 
years older than hearing girls. This discrepancy in age between 
groups chosen from approximately the same school grades is due to 
the retardation in the language development, and consequently in the 
school progress of the deaf. Had hearing subjects been matched more 
closely with the deaf, samplings from these two populations at the 
elementary-school level would not have been equally representative of 
their respective groups. A selection of chronologically older hearing 
pupils would have resulted in a population weighted by academically 
retarded cases. On the other hand, had the deaf group been limited 
to pupils below the age of fourteen years, the number of available 
cases would have been greatly reduced and confined to a highly 
selected group. 


TEST SCORES OF DEAF AND HEARING GROUPS 


Mean scores on the fears* and wishes tests, and differences between 
the scores of the deaf and the hearing are given in Table II. The 





* The reliability coefficient of the fears list, obtained from correlating the odd 
with the even numbered items for one hundred sixty-two cases is .74 for the half 











— ese TY fs ieee 


loo wel 








id 


es 
ng 
eT 
ch 
en 


nd 
‘in 
slic 
of 
irls 
ted 


the 

on 
ree 
een 


the 
ore 
the 
e of 
ring 
ally 
ited 
rble 
shly 


een 
The 


odd 
half 





Certain Fears and Wishes among Children 261 


information listed in this table includes the number of cases in each 
group, the mean number of fears and of wishes indicative of immediate 
satisfaction, the standard deviation, the mean difference between 
groups, the standard error of these differences, and the ratio of the 
mean differences to their standard error. 


Tas_e I.—Perrsonat Data or Dear anpD HEARING SusBsEctTs 











‘ Age at becoming | Per cent hearing 
mn ye deaf in better ear 
Subjects 
Mean SD Mean SD Mean SD 
Deal REGO: <\. . otvaenss 14.30 | 1.91 3.97 3.54 22.14 | 21.44 
Hearing boys.......... 12.00; 1.35 
De GN. pv cs ces ccnes 14.98 | 1.84 3.48 3.12 19.60 | 19.74 
Hearing girls.......... 11.97 | 1.11 























TasLe IJ].—DiFrFreERENCE IN NUMBER OF FEARS AND OF WISHES FOR IMMEDIATE 
SATISFACTION OF Dear AND HeARING CHILDREN 












































Fears test Wishes test 
Subjects N P 4 
Mean | SD 8Da | d/SDa | Mean | SD SDa | d/SDa 
mean mean 
Rc ictcdncueecs 85) 8.12/4.68) .... epee cose. 4S Sea Tae 
Hearing boys.......... 168} 7.58)/5.01) .54/| .64 .85 | 2.72 |1.71] .64] .22 2.95 
EE Pe 74, 13.10/5.43) .... |... ovee LOW 2. 
Hearing girls.......... 177| 10.06|5.14| 3.04 | .74 4.11 | 2.74 |1.72| 1.53 | .21 7.46 





For deaf girls, the fears scores range from three to twenty with a 
mean of 13.10, for hearing girls from zero to twenty-five with a mean 
of 10.06, for deaf boys from zero to twenty-four with a mean of 8.12, 
and for hearing boys from zero to twenty-two with a mean of 7.58. 
Deaf girls express the largest number of fears and hearing girls the 
next largest number, whereas deaf boys rank third and hearing boys 
last. The difference between the number of fears reported by deaf 
and hearing boys is small. Deaf girls, on the other hand, express a 
decidedly larger number of fears than hearing girls, the difference 
between the two groups of girls being statistically significant. Sex 





test, and when corrected by the Spearman-Brown Prophecy Formula, .85 for the 
whole test. In view of the small number of items included in the wishes test, 
reliability coefficients were not computed. 
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differences in the fears scores of the deaf and hearing groups are 
marked. Hearing girls average 2.5 more fears than hearing boys, 
and deaf girls average five more fears than deaf boys. The ratio 
of these differences to their standard error is 4.53 for the hearing and 
6.15 for the deaf. 

On the wishes test, the deaf of both sexes score more unfavorably 
than the hearing; out of the seven pairs of wishes, deaf girls express 
4.27 wishes and deaf boys 3.36 wishes that are indicative of immediate 
satisfaction, whereas hearing boys and girls with almost identical 
average scores of 2.72 and 2.74 express the fewest number of wishes 
for immediate fulfillment. The range in the number of such wishes 
is from one to seven for deaf girls, from zero to seven for hearing girls, 
and from zero to six for the deaf and the hearing boys. Differences 
between the scores of the deaf and the hearing on the wishes test are 
larger than on the fears test, and are for both sexes in favor of the 
hearing. Sex differences are relatively smaller. The standard ratio 
between the number of wishes for immediate gratification for boys 
and girls is 3.87 for the deaf and .11 for the hearing. 

Differences in test scores are much larger between deaf and hearing 
girls than between deaf and hearing boys. Hearing boys have the 
most favorable scores both in fears and wishes, and deaf girls the 


poorest. 


CORRELATIONS BETWEEN TEST SCORES AND OTHER FACTORS 


Results from correlating the scores of deaf subjects on the fears 
and wishes tests with chronological age, age at becoming deaf, and per 
cent of hearing in the better ear as measured by the 3A audiometer, are 
shown in Table III. As the giving of these two personality tests 
required about twice as long for deaf as for hearing pupils, the school 
time that could be allotted for testing the deaf did not permit the 
inclusion of a measure of mental ability. 

To hearing subjects it was possible to give the Pintner Intelligence 
Test (2). Table IV lists the correlation coefficients of scores on the 
wishes and fears tests with age, mental age, and intelligence quotient 
among hearing boys and girls. 

The correlation coefficients between scores in fears and wishes and 
the factors listed in Tables III and IV are uniformly low and are for 
the most part not reliable. However, among hearing girls fewer fears 
seem to be related to greater maturity in chronological age and mental 
age, and to higher intelligence. Among the deaf of both sexes, slight 
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negative correlations indicate that pupils deaf since birth or early 
childhood have more fears than those who lost their hearing at a later 
age. This relationship appears to be linear. Also, there is the sug- 
gestion of a very slight relationship among deaf boys, but not among 
deaf girls, between fewer fears and larger amounts of hearing in the 
better ear. 


Taste II].—Tzst Scores or Dear Group CorreELATED wits AGE, Per Cent or 
HEARING, AND AGE aT Becomine Dear 











Deaf population 
Measure correlated 
Boys Girls 

Score on word list (fears) with: 

CIS boc cccccccwvencesecsees — .066 + .073 | +.113 + .077 

ee GI cba es ccsccbcessovess — .184 + .080 |} +.077 + .084 

Age at becoming deaf.................... — .209 + .075 | —.170 + .086 
Score on wishes test (immediate satisfaction) 

with: 

RS  nsc ses cock tne 4060s +.187 + .071 | +.069 + .078 

Ss 6b acc nccdes 0b atunces + .055 + .083 | —.020 + .085 

Age at becoming deaf.................... — .088 + .078| —.191 + .086 











Taste I[V.—Test Scores or Hearinc Group CorRRELATED WITH AGE AND 











INTELLIGENCE 
Normal hearing population 
Measures correlated 
Boys Girls 

Score on word list (fears) with: 

ILS ny ss oo cae te esecceces + .043 + .052 | —.194 + .049 

et od cueing on Qeael — .012 + .061 | —.237 + .058 

Intelligence quotient....................- — .072 + .061 | —.221 + .059 
Score on wishes test (immediate satisfaction) 

with: 

nds ccoae eben eave +.136 + .051 | +.103 + .050 

gt ERR E I REE Ae Sree eames Ree + .027 + .061 | +.032 + .062 

Intelligence quotient..................06. — .069 + .061 | —.025 + .062 











None of the correlations between score on the wishes test and 
other factors are reliable. But there is a negligible relationship, 
both among the deaf and the hearing, between a larger number of 
wishes for immediate satisfaction and higher chronological age. Also 
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there is a faint indication that boys and girls deafened later in child- 
hood express somewhat fewer wishes for immediate fulfillment than 
those deaf at an earlier age. 

In view of the consistently low correlations between chronological 
age and test scores, it may be concluded that the discrepancy of two 
to three years found between the mean ages of the deaf and the 
hearing groups does not account for the differences obtained for these 
groups on the fears list and the wishes test. 


COMPARISON OF THE RESPONSES OF DEAF AND HEARING GROUPS TO 
INDIVIDUAL ITEMS OF THE FEARS TEST 


The frequency of checking each of the items included in the list 
of fears was tabulated and computed for the deaf and the hearing of 
each sex. Table V lists the items of the fears test and the per cent 
of cases in each group checking these words. 

In comparisons of the frequency of responses to items among deaf 
and hearing boys, statistically significant differences are found for 
four items. (Statistically significant differences include those differ- 
ences in which the ratio of the difference between the means to the 
standard error of the difference amounts to 2.50rmore.) Significantly 
more deaf than hearing boys state that they are afraid of ‘‘ Thunder” 
and ‘‘Policeman,’’ whereas a larger number of hearing boys indicate 
fear of ‘“‘Robbers”’ and “ Report Card.” 

Statistically significant differences between the responses of deaf 
and hearing girls are more numerous. There are seven items for which 
significantly more deaf than hearing girls express fear; these are 
“Mouse,” “Big boys,” “Falling,” “Night,” “Lightning,” ‘ Dark,” 
and “Thunder.” In no item do hearing girls indicate a significantly 
larger number of fears than deaf girls. 

It is interesting to note that there is only one item in which the 
deaf of both sexes differ consistently and to a statistically significant 
degree from the hearing, namely in the fear of ‘‘Thunder.” Also, 
somewhat more of the deaf than of the hearing express fear of ‘‘Light- 
ning,’”’ but the difference for the boys is not statistically significant. 

If all differences with a standard ratio of .50 or more are considered, 
it appears that there are eighteen items for which deaf boys express 
more frequent fear than hearing boys, and ten items for which the 
hearing express a more frequent fear. No essential difference between 
the two groups is found in the remaining eleven items. 
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Taste V.—DIrrerence IN THE PERCENTAGE or Dear anp Heartne CartprRen 
Reporting Certain Fears 









































Girls 
Fears list eit ed 
Per | > a SDa | d/SDaz dy | SDa | 4/SDa 
cent 
cent 

ee SE eS To Sees 0.5 | 1.00 .50 
GE ce coxacccce 22 | 20 2 5.48 .87 6 6.85 .88 
Ree. ins. bce 64 7+ 81% | 17 6.00 .83 1 4.36 .23 
Big boys............ 18 | 10 7 4.80 .67 23 5.83 | 3.95 
Gectisccecatecs 0 1 1 1.00 .00 3 2.00} 1.50 
ES Ee 1 0.5] 0.5] 1.00 .50 0.5 | 1.00 .50 
Lightning........... 32 + 20 12 6.00 .00 22 6.78 | 3.25 
Diese. ctcxid.. 16 | 10 6 4.58 .31 2 5.56 .36 
sri hts cece ces 16 | 17 1 5.00 .20 e 18 6.71 | 2.68 
Give ssceceks oc 5 3 2 2.82 71] 55 27 6.71 | 4.02 
EO a 28 | 24 4 5.92 .68 | 50 - 23 6.71 | 3.43 
I AN. 12 8 4 4.12 .97 | 22 9 5.47 | 1.65 
Animals............ 22 | 24 2 5.56 .36 | 36 Ss 6.48 | 1.24 
Report card......... 9 23 14 4.58 .06 20 2 5.56 . 36 
Fire..... 52 + 6lee| 9 6.63 36} 83_) 7 5.47] 1.28 
Being alone......... 6 | 13 7 3.74 .87 | 36 8 6.48 | 1.24 
tye eliepaaae. 4 1 3 2.45 22] 12 7 4.12] 1.70 
MIs sc bed cies 2 1 1 1.73 .58 1 0.5 | 1.00 .50 
#4 Snake 61 +67—| 6 6.40 94] 87 10 5.00] 2.00 
Death 72 172-1 © 6.00 79 - 7 5.83 | 1.20 
ee 33 | 16 7 5.83 92] 53 18 6.93 | 2.60 
Water....... 2 3 1 2.00 .50 8 6 3.31] 1.81 
Movies..... 2 1 1 1.73 .58 5 2 2.83 71 
a S 5 3 3.46 .87 | 18 10 5.00} 2.00 
___  peennmel abr pellidode 25 | 24 1 5.74 17] 37 15 6.40] 2.34 
ae 67 ~+ 66—~| 1 6.24 .16 7 4.24] 1.65 
High places 14 | 23 9 5.00 .80 1 5.29 .19 
nn Ee 2 0.5] 1.50) 1.41 .06 0.5 | 1.00 .50 
School. 0 1 1 1.00 .00 2.5/1 2.00] 1.25 
Hospital............ 32 | 27 5 6.16 81 16 6.86 | 2.33 
SN 1 5 4 2.00 .00 4 3.31] 1.21 
Crossing street... .. . 14 / 11 3 4.47 .67 6 5.83 | 1.03 
ae her, Oe ee 75 4 63—/ 12 6.00 .00 5 4.24] 1.18 
ES, 14 5 6 «| 4.24 .42 14 5.66 | 2.47 
Mother 1 3 2 1.73 .16 2.5| 2.00] 1.25 

Dogs. 8 | 4 4 | 3.31] 1.21 9 0 | 4.00] 0 
RG ee a iaets 6 6 0 3.31 7 18 5.38] 3.35 
Father devttds cxad: 1 0.5} 0.5] 1.00 . 50 0. 3.5 | 2.23] 1.57 
Policemen........ |. 13 3 10 3.87 | 2.58 5 6 4.00; 1.50 
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Among girls, the deaf indicate a more frequent fear in thirty 
items. Differences in the remaining nine items are negligible. Hence 
there are no items in which hearing girls are more often fearful than 
the deaf. 

Another way of comparing the deaf with the hearing is by noting 
the items checked by more than fifty per cent of each of the groups. 
Six fears are checked by more than fifty per cent of the deaf boys. 
Exactly the same fears are reported also by more than one-half of the 
hearing boys. There is, however, a slight variation between deaf and 
hearing boys in the ranking of the frequency of these items. Ten fears 
are reported consistently by more than half of the deaf girls, and there 
are six such fears for hearing girls. The six most common fears of 
hearing girls are included in the ten most frequently named fears of 
the deaf girls. Furthermore, these six fears are in complete agreement 
with those of deaf and hearing boys; the only variation from group to 
group being found in the relative order of the frequency of items. 
The six fears named by at least fifty per cent of each of the groups are: 
‘“War,” “‘ Death,” ‘‘Bad men,” ‘‘ Robbers,” “‘Snake,” and “ Fire.”’ 

It is interesting to note in this connection the possible verbal or 
imaginative source for some of these fears in view of the small likelihood 
that urban school children of average or below average socio-economic 
status would have met with such dangers as snakes, robbers, and bad 
men. The wide spread fear of war is also worthy of note. A con- 
sideration of these items emphasizes the limitations of the technique 
here employed. Whereas the word list of fears has the advantage 
of objectivity, the expression of the range and various manifestations 
of possible fear responses is on the other hand restricted. Moreover, 
the words checked give no definite clue to the relative intensity of 
each of the fears, nor to the nature of the situations in which they 
occur. 

Since the items for this test were gathered from the commonly 
named fears of hearing children, there is a possibility that this list 
draws more heavily upon the fear responses of hearing than of deaf 
children, if fundamental differences between the fears of these two 
groups are assumed to exist. In that event, the fears score of the 
deaf may represent an understatement of fact, and the fears score of 
the hearing a comparative overstatement. 

In the comparisons of the present study, the deaf score as more 
fearful than the hearing of corresponding sex. But since the differ- 
ences between the sexes are greater than between the deaf and the 
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hearing, the influence of deafness on fears seems to be overshadowed 
by the factor of sex. The larger number of fears reported by girls is 
in keeping with the results obtained by Maller! who asked his subjects 
to cross out on a given list all the items of which they often felt afraid. 
He found a standard ratio of 8.72 in favor of boys. In the present 
study, in comparison with the relatively slight differences found 
between the fears of the deaf and the hearing boys, the deaf girls 
score as far more fearful and timid than any of the other groups. 


COMPARISON OF DEAF WITH HEARING GROUPS ON THE ITEMS OF THE 
WISHES TEST 


The distribution of the responses of deaf and hearing pupils to the 
seven sets of paired wishes is shown in Table VI. In each wish the 
per cent of cases designating a preference for immediate fulfillment is 
compared with the per cent of cases choosing deferred satisfaction. 

The deaf emphasize wishes for immediate satisfaction to a greater 
extent than the hearing, and this tendency is more pronounced among 
girls than boys. Differences between the deaf and the hearing in 
choices for immediate satisfaction are statistically significant for deaf 
boys in two of the comparisons and for deaf girls in five comparisons. 
As in the case of fears, differences between the wishes of deaf and 
hearing girls are more marked than between those of deaf and hearing 
boys. But the wishes of both sexes of the deaf in which decidedly 
larger proportions of deaf than of hearing pupils express a desire for 
immediate satisfaction are for articles that lend themselves to imme- 
diate consumption, including candy, clothes, and movies. 

In only two instances do the hearing indicate a more frequent 
preference for immediate satisfaction; a slightly larger proportion 
of hearing boys would rather have a good report card in school than 
be very smart, and somewhat more of the hearing girls would rather 
have one cent now than ten cents next week. The differences between 
the wishes of deaf and hearing boys is practically negligible with 
respect to having varying amounts of pennies and in the relatively 


improbable situation of choosing an automobile now or a million 


dollars next year fi 


What does this more frequent tendency of the deaf to choose 
immediate satisfaction signify? Washburne,? who examined his 
subjects individually with three sets of wishes and actually carried 
out two of them, found that a consistent willingness to sacrifice an 
immediate satisfaction for a greater future satisfaction was closely 
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correlated with desirable conduct among subjects between the mental 
ages of eight and thirteen years. A consistent tendency in the 
opposite direction was linked with delinquency or problem behavior. 
Washburne concludes that this test measures “judgment,” a function 
which develops with mental age and chronological age, and that when 
judgment lags behind in development, ‘“‘there is trouble, and when it 
lags far behind, there is serious trouble.”’ 

In the light of Washburne’s assertion, the results of the present 
study would point to a greater immaturity in judgment among the 
deaf, and particularly among deaf girls, than is found with the hearing. 
The very low correlations (Tables III and IV) obtained between 
scores on the wishes test and the factors of age and intelligence do 
not necessarily disprove Washburne’s point as the age-range of the 
present population is comparatively narrow. If additional experi- 
ments should substantiate a deficiency in foresight and planning 
ability on the part of deaf children, the educational implications are 
clear. 


SUMMARY 


A comparison was made of the responses of deaf and hearing public 
school children to a check list of thirty-nine fears, and to seven sets of 
wishes which permitted a choice between a desire for the immediate 
_ fulfillment of a smaller gratification and the delayed fulfillment of a 
greater good. 

Deaf girls reported significantly more than hearing girls, 
deaf boys slightly more than hearing boys. (Sex differences were 
greater than differences between the deaf and the hearing. ) 

Deaf boys and girls expressed a greater number of wishes for 
immediate satisfaction than the hearing. Differences between the 
deaf and the hearing in this respect were larger than between the 
sexes. Deaf girls expressed more fears than any other group, and 
also the largest number of wishes for immediate satisfaction. 

Correlations between scores on the fears and wishes tests with the 
factors of age, intelligence, age at becoming deaf, and per cent of 
hearing in the better ear were slight. 

With respect to the six most frequently named fears, there was 
complete agreement among the deaf and the hearing of both sexes) 
On the wishes test, the deaf exceeded the hearing in wishes for immedi- 
ate gratification, and particularly in those items that involved articles 
for immediate consumption. 
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PROFESSED ATTITUDES AND ACTUAL BEHAVIOR 


STEPHEN M. COREY 
University of Wisconsin 
INTRODUCTION 


Granting the significance from certain points of view of verbal 
opinions as such, they are of limited practical value unless they 
presage behavior. It is of interest to determine what a subject 
says his attitude is in regard to communism, the church, or foreign 
missions, but of greater moment sociologically is the way he acts in 
relation to these institutions. Most psychologists and sociologists 
are implicitly, at least, in agreement on this point as is made evident 
both by their definitions of social attitudes and by the statistical 
data advanced in defense of published scales. Droba,’ Allport,' 
Bain,? Cantril,* Faris,* and others have reported large numbers of 
definitions which have in common an insistence that a social attitude 
of a particular sort predisposes one to behave in a particular manner. 
Words such as “preparatory for” or “indicative of’? behavior are 
common in these definitions. 

In view of this general agreement as to what a social attitude is, 
it is possible to state one criterion of the validity of social attitude 
questionnaires—namely, the relationship between questionnaire scores 
and overt behavior. In other words, if a social attitude is a deter- 
miner of overt behavior, social attitude questionnaires may be con- 
sidered valid if they make possible predictions of overt behavior. 
Under some circumstances behavior might be consciously engaged in to 
give a false impression of an attitude, but in the long run the only 
evidence for the insincerity of such an expression would be more 
behavior under different circumstances. In the last analysis, the 
way & person acts over a period of time is a reliable and valid indication 
of his attitudes. 

If this concept of attitude questionnaire validity is granted, it is 
rather surprising that so few investigations have been undertaken to 
determine the relationship between verbal opinions and overt behavior. 
As might have been predicted in light of what we know about the 
history of the testing movement in general, the investigators who 
have developed the social attitude questionnaires have apparently 
been much more concerned with reliability than with validity. Con- 
sequently, very reliable instruments called attitude scales are com- 

271 








- * 
Se - 

















Ret ee a Ep 








i) 

) 

; 

iy 
a 
’ 

5 
te 4 

4 


<a 


nk gt, 





- ¢ ds st 


Sore 


OE ee om 
ee 


se <> os ie 
een 


272 The Journal of Educational Psychology 


monly employed in psychological and sociological investigations 
but the validity of these scales, in the sense of the writer, has either 
been taken for granted or has been demonstrated by administering the 
questionnaires to groups commonly believed to represent varying 
attitudes with respect to the institutions, objects, or practices in 
question. A number of years ago Bain? called attention to this 
lack of dependable evidence for validity and contended that while 
in most cases a relationship between verbal and overt behavior is 
assumed, “ .. . this relationship must be determined . . . before 
the study has any great value.” 

There are a considerable number of investigations which imply the 
validity of attitude questionnaires. The implication, however, is 
not quantitative in any statistical sense but is based on common 
sense evidence. For example, Smith,” found that her ‘“‘ Attitude 
Toward Prohibition” scale, when administered to college students, 
Y.W.C.A. workers, Methodists, and business men, yielded scores in 
keeping with the attitude one might expect these groups to hold. 
Our very expectations, however, are in most cases based upon verbal 
opinions rather than observations of overt behavior so that some 
degree of relationship would be inevitable. It would be more sig- 
nificant could we know for these same groups the degree in which their 
attitude scale scores were indicative of their actual drinking practices. 

Rogers” reported analagous results after administering a ques- 
tionnaire involving attitudes toward war to students taking advanced 
R.O.T.C., basic R.O.T.C., and a control group taking neither. Prac- 
tically every question was reacted to differently by the three groups— 
the men in the advanced R.O.T.C. courses expressing attitudes 
significantly more sympathetic to war while those in the control 
group were most antipathetic. These results are a bit more pertinent 
in that by and large the men in the advanced R.O.T.C. units were, 
by their very interest in such an activity, manifesting overt behavior 
indicative of some sympathy for militaristic activities. 

Porter!® found that the amount of military training and attitude 
toward war correlated +-.30 + .03 for a group of five hundred sixty-two 
college men. This datum implies validity for the questionnaire used 
in as much as the men expressing the most militaristic attitudes had 
elected to continue military training for the longest period. This 
need not necessarily mean, however, that the military training resulted 
in changed attitudes because no retest was administered involving any 
appreciable number of subjects.‘ The test was readministered one 
year later to only nineteen men who had undergone military training 
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during this period and, paradoxically enough, their final scores indi- 
cated a slight shift toward pacificism. The number of cases involved 
was obviously too few to permit of any generalizations. 

Sims and Patrick” administered Hinckley’s “Attitude Toward 
the Negro” scale to northern and southern university students and 
reported that the latter were significantly more antagonistic to 
Negroes. These results are in harmony with those reported by 
Garrison and Burch,” Hinckley," Likert,’* and others and seem as 
well to coincide with our common sense notions. Reinhardt,’* on 
the other hand, using Bogardus’ “Social Distance” method, found 
that students in North Dakota were more prejudiced toward Negroes 
than were students in West Virginia. While Reinhardt’s groups were 
very small his results imply a criticism of the validity of attitudinal 
investigations based upon statements of verbal opinions. Katz 
and Allport’? also found that Syracuse University students from the 
North were more resentful of Negroes than were their classmates 
from the South. 

Droba® administered his “‘ Attitude Toward War’”’ scale to some 
one thousand college students and found that men who had seen 
war service were slightly more militaristic than those who had not. 
This result would constitute an argument either for or against the 
validity of the questionnaire depending upon whether the investigator 
thought that military experience was conducive to the development 
of militaristic or pacifistic attitudes. Because both points of view 
have been advanced the argument is circular. Droba’s report is of 
particular interest in view of Porters'® statement that for his subjects 
military training in college resulted in at least a noticeable trend toward 
pacifism. The twenty-one Socialists who were included in Droba’s 
study were decidedly more pacifistic than members of either the 
Republican or Democratic parties. This is in agreement with rather 
commonly accepted beliefs about Socialists. 

Stalnaker,?? using Thurstone’s technique, investigated the atti- 
tudes developed toward intercollegiate athletics by various groups, 
and found, in general, what might have been predicted. Athletes 
and parents were in general most sympathetic and faculty and adminis- 
trative officers were most antagonistic. An interesting check upon 
the common sense evidence for the validity of Stalnaker’s question- 
naire might be systematic observation of the behavior of some of his 
subjects when intercollegiate athletics were involved. 

These investigations, and there are others like them, do present 
some indirect evidence almost of an anecdotal sort for the validity of 
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attitude questionnaires. They indicate that in general and roughly 
speaking certain groups express attitudes such as might be expected 
under the circumstances. In but one or two instances, however, was 
actual overt behavior compared with attitude scale results and in 
no case was the amount of the overt behavior accurately estimated. 
On the contrary, attitude scales scores, based upon verbal opinions, 
were compared with what is generally expected of certain groups, 
which expectations are also in large part based upon verbal opinions. 

There is another group of investigations which come a bit nearer 
the heart of the matter. These are not quantitative in any complete 
sense, but the actual behavior with which attitudinal statements are 
compared was estimated more objectively than was the case in the 
studies summarized above. Zimmerman,”® for example, investigated 
the attitudes of Minnesota farmers toward coéperative buying and 
selling and then related their verbally expressed attitudes to the 
number of years’ experience each farmer had had in the coéperative 
movement. He reported a high degree of curvilinear correlation 
between these two variables. In Zimmerman’s study those groups 
most favorable in their attitudes toward codperative buying and 
selling had been actively engaging in such practices for the longest 
time. The attitude expressed by these Minnesota farmers had 
validity in terms of the definitions of attitudes most commonly 
advanced. It was definitely indicative of behavior. 

Stouffer,?* in his comparison of statistical and case history methods 
of attitude research found a correlation of +.86 between ‘‘ Attitude 
Toward Prohibition” scale scores and attitudes as inferred from 
autobiographies describing the subjects’ experiences with liquor. 
This relationship is indicative of the validity of the attitude scales, 
in the sense the writer is using, only if it is assumed that the auto- 
biographical sketches were descriptive of actual experiences. Such 
descriptions are subject to many of the same limitations with regard 
to their validity as are attitude questionnaires. 

Stagner and Drought?! checked the validity of their scale measuring 
the attitude of children toward their parents in much the same fashion 
—by comparing attitude scale results with autobiographical materials. 
Again, it may be said that both of these sources for inferring attitudes 
are subject to the same inherent limitations. In the Stagner-Drought 
study some of the biographical materials appear to have been supplied 
by individuals other than the subjects, which would enhance their 
value for the purpose of validating the attitude scales. 
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These three studies indicate rather definitely that the attitudinal 
opinions did accord, first, with what might be expected from knowing 
certain aspects of the subjects’ behavior (Zimmerman) and, second, with 
their own description of their past behavior (Stouffer, Stagner, and 
Drought). No study has been made, so far as the writer is aware, 
in which it was possible to get a rather accurate quantitative estimate 
of degree of attitude as expressed in statement form and in addition 
an equally reliable measure of behavior, bearing upon the same insti- 
tution or practice, with which comparisons might be made. The 
present investigation is of this sort. It is a comparison of scores 
made on an attitude questionnaire pertaining to honesty in the class- 
room with actual cheating in the classroom. 


SUBJECTS AND METHODS 


The subjects were sixty-seven university students taking an 
introductory course in educational psychology. Each Friday, for 
five weeks, an objective true-false examination was administered 
covering the week’s work. These tests ranged in length from forty 
to forty-five items. The papers were given back to the students for 
grading at the next class meeting. In the meantime, they had been 
scored accurately but no marks were placed upon them. The differ- 
ence between the true score and the score the student reported for 
himself was the basic cheating index. The students were told to 
mark the statements with symbolic “‘pluses” and ‘‘minuses”’ so that 
cheating, providing the intent was present, was very easy. No 
attempt was made to supervise the students’ scoring of their own 
papers. 

Objections to this technique are rather obvious. It might be 
contended that university students would very quickly suspect some 
ulterior motive in such a procedure, and behave a bit contrary to 
their custom. Certainly, if even a small group was of the opinion 
that its honesty was being measured, the news would soon get around. 
Under the circumstances, the only check possible upon the circulation 
of such a rumor would be the tendency for cheating to decrease from 
week to week. The data presented in Table I indicate that there 
was no such decrease. While there was variation in the amount of 
cheating from week to week, the amount of dishonesty in grading the 
last test was almost as great as that on the first. Apparently the 
students’ practices in regard to cheating were not complicated by 
fears that their dishonesty was regularly detected. The chief deter- 
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mining factor seemed to be, as will be indicated later, the difficulty 
of the test. 


Taste I.—Mzan Nomser or Points REPRESENTING CHEATING ON EAcH or 
Frvz Tests 


Tresr 


crm Oo te 
acl al GETTER od 
om oS & 


The attitude questionnaire used has been described elsewhere.* It 
was constructed somewhat after the Thurstone™* technique with 
modifications suggested by Seashore and Hevner.** The question- 
naires were scored after the fashion described by Likert, et. al.'4 The 
corrected reliability of the questionnaire was +.907 + .02* when 
signed and +.913 + .02 when unsigned. By a system of secret 
identification marks it was possible to obtain both signed and unsigned 
questionnaires from each student in such a manner that the unsigned 
papers could later be identified. Although the differences between 
the two questionnaires were not statistically significant® those that 
were unsigned consistently indicated a more sympathetic attitude 
toward cheating. Because such seemed to be inherent evidence of 
their greater validity the unsigned scales were used in all of the 
computations reported below. 

Measures of Cheating.—As has been suggested, the basic cheating 
score was the gross difference between the score secretly given by the 
instructor and that given to himself by the student. For example, 
the student who reported a grade of 43 for himself after the paper 
had been scored by the instructor with a resulting grade of 35, was 
given a cheating score of 8 for that examination. The reliability 
of this cheating index was none too satisfactory. When a Pearson 
product-moment reliability coefficient was computed between gross 
cheating scores on the first two and last two tests and stepped up by 
use of the Spearman prophecy formula the result was +.65 + .07. 
Because of the asymmetry of the data this coefficient was checked 
against a similar index resulting from the use of Sheppard’s method 
of unlike signs. There was no appreciable difference between the 
two coefficients. 





* All sigmas are approximate and were obtained from nomographs. 
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Another cheating index was computed by getting the relationship 
between this gross cheating score and the proximity of the student’s 
actual score to the maximum possible on that particular examination. 
This index made it possible to control in a measure the variable factor 
of temptation to cheat. For example, if a student’s true score on a 
forty-five point test was 25, and if he raised his score to 35 by changing 
ten questions, his cheating index corrected for “‘temptation to cheat,’ 
so to speak, would be .5; or 10 (the amount he cheated ) divided by 20, 
(the difference between his true score and the maximum possible 
score). Similarly, a student might have the same “corrected” 
cheating index whose true score on a 45-point examination was 35 
but who reported for himself a grade of 40. The latter student 
cheated but half as much, but the temptation to do so was not so 
great. 


FINDINGS 


Range of Scores.—The maximum score on the attitude question- 
naire was 250 and the minimum 50. Because of the scoring method 
each statement on the fifty-item questionnaire was given a maximum 
point value of five, which indicated sympathy for the practice of 
cheating, and a minimum value of one, indicating antipathy for cheat- 
ing. The mean obtained score for the sixty-seven students was 
133.48 with a standard deviation of 21.84. The maximum actual 
cheating score in terms of the number of points a student so inclined 
might add to his actual score for all examinations was 225, or the total 
number of items on all five tests. The mean cheating score for all 
five tests was 9.03, with a standard deviation of 12.66—slightly less 
than two points per test. The distribution of the actual cheating 
scores was markedly skewed to the right. Twenty-four per cent 
of the subjects did not cheat on any of the five tests. One student, 
on the other hand, raised his score an average of twelve points (about 
twenty-five per cent) on each test. 

Correlation between Gross Cheating Score and Attitude Questionnatre 
Score—The Pearson product-moment coefficient of correlation 
between gross cheating scores and attitude questionnaire scores was 
practically zero. The obtained coefficient was +.024 + .12SD which 
when corrected for attenuation became +.032 + .12 SD. In other 
words, the attitude questionnaire used—a highly reliable one—gave 
no hint as to how students would behave. Verbal opinions regarding 
cheating on examinations were unrelated to actual cheating practices. 
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Because of the lack of symmetry of the distribution of the cheating 
scores, the zero scores were eliminated and a coefficient computed 
between attitude questionnaire scores and the cheating score for those 
fifty-two students who cheated. The resulting coefficient was 
+.014 + .13 SD. 

No correlation ratios were computed for two reasons,® first, the 
number of cases was too small to make such an index very significant, 
and, secondly, the correlation table did not indicate clearly any degree 
of non-linear relationship despite the implication that such might 
be the case in view of the data presented in Table II below. 

Another method of expressing the lack of relationship between 
the unsigned attitude questionnaire scores and the gross cheating 
scores is by indicating the mean cheating scores for students ranking 
in the different quarters of the attitude questionnaire. These data 
are set forth in Table II. The lack of symmetry of the distributions 
is again illustrated in the unusually large standard deviations. The 
difference between cheating scores for students in the highest and the 
lowest attitude quesionnaire quarters is less than its standard devia- 
tion. The differences in cheating scores between the middle two and 
either the lowest or highest questionnaire quarters were equally 
insignificant statistically. 


TaBLe I].—Caeatine Scores ror Stupents ScorinG IN DIFFERENT QUARTERS 
OF THE ATTITUDE QUESTIONNAIRE 





Quarters of the attitude questionnaire 





Lowest | Middletwo| Highest 








Mean cheating score.................... 6.67 9.91 8.63 
Standard deviation..................... 11.60 12.15 11.8 
vg nace ko bas weceeees 17 34 16 











Relation between Cheating and Temptation to Cheat.—Defining 
“temptation to cheat’’ as it was above, namely, the difference between 
the true score on the test and the maximum possible score, and correlat- 
ing this index with the actual cheating score yielded a Pearson coefh- 
cient of +.46 + .09SD. Whether or not a student cheated depended 
in much larger part upon how well he had prepared for the examination 
than upon any opinions he had stated about honesty in examinations. 
This is also brought out in the mean cheating score for test 2 as given 
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in Table I above. This particular test was by far the easiest of the 
series and consequently presented the least temptation to cheat. 

When scores on the attitude questionnaire were correlated with 
cheating scores with temptation to cheat eliminated as a factor* 
the coefficient was +.13 + .12 SD, an insignificant and unreliable 
relationship. This coefficient seemed to imply further lack of validity 
on the part of the attitude questionnaire used. 

Conclusions.—The data presented in this study show that overt 
behavior, as measured by the amounts students will change their 
test papers when allowed to do their own grading, is not related to 
attitudinal scores derived from a highly reliable questionnaire measur- 
ing verbal opinions toward cheating on examinations. 


DISCUSSION 


It is impossible to say in advance of investigation whether the 
lack of relationship reported here between attitude questionnaire 
scores and overt behavior is generally true for measures of verbal 
opinion. Were that the case, the value of attitude scales and question- 
naires would for most practical purposes be extremely slight. It would 
avail a teacher very little, for example, so to teach as to cause a change 
in scores on a questionnaire measuring attitude toward Communism 
if these scores were in no way indicative of the behavior of his pupils. 

It is difficult to devise techniques whereby certain types of overt 
behavior can be rather objectively estimated for the purpose of com- 
parison with verbal opinions. Such studies despite their difficulty, 
would seem to be very much worthwhile. It is conceivable that our 
attitude testing program has gone far in the wrong direction. The 
available scales and technics are almost too neat. The ease with 
which so-called attitudinal studies can be conducted is attractive but 
the implications are equivocal. 
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THE MEASUREMENT OF ATTITUDE TOWARD WAR 
AND THE GALVANIC SKIN RESPONSE 


S. N. F. CHANT AND M. D. SALTER 


University of Toronto 


The more that is known about any particular quantity the more 
significant becomes its measurement. Hence, along with the develop- 
ment of adequate methods for measuring a quantity there arises a 
demand for a more complete description of the quantity measured. 
Thus recent developments in the field of attitude measurement make 
it increasingly important to understand the nature of the quantity 
measured. When relatively adequate scales for measurement are 
available the most useful information concerning the quantity which 
is measured comes about through a use of the measuring scales them- 
selves. The standard procedure in this regard is to relate the measure- 
ments which are obtained by the scales to some other variables that 
are thought to be significant, thus bringing to light relationships which 
the quantity measured bears to these variables. This study is an 
investigation of the relationship between attitudes as measured by 
the ‘Attitude Toward War Scale,” No. 2, Form A, constructed by 
D. D. Droba and edited by L. L. Thurstone,' and the galvanic skin 
response, which has frequently been employed as a measure of emo- 
tional response. 

The “‘ Attitude Toward War Scale” consists of twenty-two state- 
ments of opinion ranging from the most militaristic with a scale value 
of .06 to the most pacifistic with a scale value of 10.7. These state- 
ments were read to each subject and the score of the individual was 
computed as the median value of all the statements with which he 
expressed agreement. 

A standard Wheatstone Bridge and galvanometer with copper 
electrodes attached to both hands of the subject were employed for 
measuring the galvanic skin response. The galvanic deflections, as 
reflected on a translucent scale set one meter from the galvanometer, 
were traced by a manually operated marker on a kymograph. Upon 
this kymograph was also recorded the reading time of each statement 
and the response time of the subject. The subject was seated with 
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arms resting on a table in a room relatively free from distractions and 
completely separated from that containing the galvanometer and 
recording apparatus. 

The subjects were thirty-three university students, fourteen of 
whom were men. Although nine of the subjects were chosen from the 
Canadian Officers’ Training Corps in an attempt to obtain militaristic 
subjects, there was only one subject who measured mildly militaristic, 
the others being either pacifistic or neutral. Table 1 shows the 
distribution of scores for the subjects. 


Tasie I.—Distripvution or Scores ON THE “Atrirupr Towarp War Sca.z” 
FOR THIRTY-THREE SUBJECTS 











Grouping Seale Frequency 
interval 

RAS Sask yids’ waded baw ceveiecnel 0- 2.9 0 
a kas wid wee miss bub wdewia eekle 3- 3.9 0 
SEE ne eee ree een met 4- 4.9 1 
aah es ol a ee ad 5- 5.9 8 
ei as ee ck we pMibn aha Gna tiet db eeeeen 6- 6.9 1 
er ee cas dur cee eecseoheunl 7- 7.9 15 
i cick ety uevawseunaeetecbeadanden 8-11.0 8 

REE SE Pe Cee Lie er 33 











The subjects were given the following instructions: 


We are going to read you a list of statements with each of which you will 
agree or disagree. If you agree with a statement say “yes” if you disagree 
say “‘no.” For instance, “A man’s friends are his most valuable possessions.” 
If you think they are his most valuable possessions answer “‘yes,’’ if not, 
answer “no.” 

This is in no sense an examination. We are not interested in your answers 
except as an indication of opinion. No opinion will be regarded as right or 
wrong. Most of the questions you will be able to answer almost immediately. 
If you wish to think about them for a short time before answering do 50. 
However please do not ponder them longer than necessary. Some of the 
statements may have philosophical or moral implications which you are to 
ignore unless they form the basis of your opinion. We want your own opinion 
and we are not interested in the speed of your answers, although we should 
like you to answer fairly promptly. There may bea rather long pause between 
some of the statements, during which you should hold yourself in readiness 
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for the next statement. (The subject is seated comfortably and the electrodes 
attached.) 

Please do not touch your hands together. It is very important that you 
do not move your hands at all, for any considerable movement or even a cough 
or sigh will interfere with the results. So please be as quiet as you possibly 
can. 

Remember if you agree with a statement say “yes” if you disagree say 
“no.” If you cannot hear the statement plainly say “‘ please repeat.” 


The usual procedure with the galvanic skin response is to allow an 
accommodation period during which the subject becomes adjusted to 
the conditions of the experiment and his resistance becomes stabilized. 
In this instance it was observed during preliminary experimentation 
that the stable level of resistance reached during such an accommoda- 
tion period was so upset by the first stimulus that rebalancing became 
necessary. Also a sufficient accommodation period so prolonged the 
experiment that the subject became restless toward the end. For 
these reasons it seemed preferable to substitute a period of preliminary 
stimulation in place of the rest period. The preliminary stimuli were 
ten statements similar in form to those of the Attitude Toward War 
Scale but quite irrelevant to the war peace statements. During this 
period a level of resistance was established to which the subject tended 
to return after each successive response. Once this balance was estab- 
lished there was no further alteration of the bridge resistance. 

Following the list of preliminary statements and without interrup- 
tion for further instructions, the first statement of the Attitude Toward 
War Scale was read to the subject, the rest of the statements following 
inorder. With sixteen of the subjects the order of the statements was 
reversed so that any progressive change of resistance on the part of the 
subject could be corrected for in determining the deflection value of 
each statement. 

After each response of the subject to a statement time was allowed 
for the resistance of the subject to return to the level at which the 
preliminary balance was taken. When this was reached a signal light 
out of the subject’s range of vision was flashed for the next statement 
to be read to the subject. Occasionally due to “spontaneous” deflec- 
tions a stimulus was given when the marker swung to the balancing 
level even though it did not become stable at this level. On these 
occasions the characteristic response and return to the stable level 
always followed the next stimulus. These “spontaneous”’ deflections 
have been noted by other investigators and in some instances appear 
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to be related to the stimulus. In this experiment, however, they were 
observed to occur most frequently in connection with distracting 
noises, movements, coughs, and sighs. 

The completed galvanic record for any subject consisted of a con- 
tinuous irregular tracing with very marked deflections for each stimulus 
statement. These deflections were measured in centimeters from the 
level at which the stimulus was presented to the highest point reached 
before the return to balance. Since the deflections were so definite 
and occurred with such regularity along with the stimulus statements, 
the extent of the deflection was accepted as a measure of the galvanic 
skin response without regard for any “‘spontaneous”’ deflections or 
minor irregularities. Figure 1 provides a sample taken from a record. 
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Fig. 1.—Sample of galvanic skin response record. 


A measure of the time of the subject’s response was recorded on the 
kymograph record. This time between the end of the stimulus state- 
ment and the subject’s verbal response is termed the “‘ decision time.” 
In the case of repeats, of which there were twenty-four in one hundred 
seventy-six instances the time was taken from the end of the first 
presentation of the statement to the response. A record was also 
obtained of the time required to read the statement to the subject. 

Our first problem was to determine if there were any significant 
differences between the mean galvanic deflections for the twenty-two 
statements of the attitude scale. The mean galvanic deflection of any 


statement is the mean of the standard scores =a x) of the deflec- 





tions occurring for all subjects for that statement. These differences 
proved to be insignificant since the most extreme difference obtained 
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was .92 with a SD difference of .75. In fact both the size of the differ- 
ences of deflection caused by the various statements for any one sub- 
ject, and the differences between the deflections for any one statement 
from one subject to another were frequently greater than the differ- 
ences between the mean deflections of the statements. This would 
suggest the following assumptions: (i) The galvanic skin response 
may be too variable to yield reliable results. (ii) Attitudes may be 
too personal and individual for the data to be dealt with by averaging 
results obtained from several subjects. 

To determine further the relationship of the galvanic skin response 
to the statements of the attitude scale the following raw correlations 
were obtained. (1) a correlation of +.72 + .07 was found between 
the decision times and the deflections for the twenty-two statements. 
This indicates first that the measurement of the galvanic skin response 
was sufficiently reliable to yield a high correlation, thus denying 
assumption (i) of the preceding paragraph. It indicates further a 
definite relationship between the decision times and the amplitude of 
the deflections for the statements. (2) The correlation between 
decision times and scale values of the various statements was .00 + .17. 
Since the scaling of the statements was done on a basis quite different 
from the acceptance or rejection of a statement, the absence of relation- 
ship in this instance is to be expected. This bears out Thurstone’s 
claim that in the construction of such scales we are not concerned with 
the subjects’ acceptance or rejection of statements. (3) Between 
the deflections and the scale values of the statements the correlation 
was —.32 + .15, which indicates some tendency for the larger deflec- 
tions to occur with the militaristic statements rather than the pacifistic 
statements. Since the subjects were predominantly pacifistic it would 
appear that the statements which were endorsed showed relatively less 
deflection than those rejected. 

An additional variable of the scale is the length of the various 
statements. When the time required to read the statements was 
correlated with galvanic deflection a relatively insignificant coefficient 
of +.24 + .14 was obtained, indicating a slight tendency for the longer 
statements to occasion greater deflections. A negligible correlation 
of +.04 + .14 was obtained between the time required to read the 
statements and the decision time. When the time required to read 
the statements is partialed from the correlation between the decision 
time and galvanic deflection the coefficient is scarcely altered, being 
raised from +.72 to +.73. This indicates that the time required to 
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read the statements has practically no bearing upon the relationship 
found between the decision time and the deflections. 

If we consider the scale as extending from a point of neutrality to 
extremes in both the pacifist and the militaristic directions we find a 
negligible correlation of +.11 + .14 between the degree of neutrality 
of a statement and the galvanic deflection. There is a more significant 
correlation of +.47 + .11 between the neutrality of a statement and 
the decision time; indicating that the neutral statements require a 
somewhat longer decision time. This is quite in accord with the 
absence of relationship found previously between decision time and 
scale value. 

A further variable is that each subject endorses some statements 
and rejects others. To determine what bearing this had upon the 
galvanic skin response the mean deflection for all statements endorsed 
by each subject was found and compared with the mean deflection 
for all statements rejected. The mean deflections for the statements 
endorsed are negative for all but ten subjects, indicating that they are 
below the general average for all statements. The average difference 
between the deflections for the statements showed those statements 
which were rejected to be on the average .71 + .16 greater than those 
for which agreement was expressed. Similarly the average time of 
decision for the statements rejected was 2.6 + .08 secs. longer than 
the time required for the endorsed statements. This would indicate 
that the statements in agreement with the subject’s attitude occasion 
less emotional disturbance and are more promptly answered than 
those which are judged to be in disagreement. This suggests that 
an attitude may be more definitely defined in terms of that which is 
acceptable to the subject than that which is rejected. Apparently 
there is less doubt concerning those statements which express the 
attitude positively. This tends to confirm the practice of measuring 
attitudes in terms of the opinions endorsed. 

Summarizing the findings considered from the standpoint of the 
scale values, three significant points appear. In the absence of any 
relationship between scale value and galvanic deflection, the definite 
relationship found between decision time and galvanic deflection 
indicates that the galvanic response as an indication of emotional 
disturbance is occasioned by the difficulty which the subject finds in 
reaching a decision rather than by the nature of the opinion expressed 
by the statement. It would appear that the emotional disturbance 
is a concomitant of the decision rather than of the attitude as such. 
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There is also evidence that the statements which are rejected occasion 
more hesitation upon the part of the subject and result in more emo- 
tional disturbance as indicated by the larger deflections. Finally 
there is a tendency for neutral statements to require a longer decision 
time than do the extreme statements. 

The same data may be considered from the standpoint of the scores 
upon the attitude toward war scale obtained by the thirty-three sub- 
jects. When the subjects’ scores are correlated with their average 
deflections calculated in ohms an insignificant coefficient of —.13 + .11 
was obtained. This suggests, if anything, that subjects with extreme 
attitudes show relatively less deflection than neutral subjects. 

The coefficient of correlation between the subjects’ scores and 
decision times was —.68 + .06; indicating that the pacifist subjects 
(i.e. those having extreme scores) required a shorter decision time than 
the neutrals. Apparently they were more definite in their judgments. 
The correlation between the variability of the subjects’ galvanic 


deflection (calculated as toon) and decision times was —.07 + .12; 





and between variability and scores +.51 + .09. This latter coefficient 
would indicate that the pacifists, or extreme cases, were more variable 
in their responses than the neutrals, sometimes showing little deflection 
for a statement and sometimes a wide deflection, whereas the neutral 
subjects show a more consistent deflection. 

The type of attitude scale employed permits of a wide variation 
in the scale values of the statements endorsed by any one subject. 
Some subjects endorsed as many as fifteen statements ranging from 
the most militaristic to the most pacifistic. This has been observed 
by several investigators. Such a wide range of endorsation indicates 
vagueness regarding the subject’s attitude to the general question of 
peace and war, otherwise it would be difficult to explain why subjects 
endorse apparently contradictory statements. As a measure of the 
subject’s inconsistency the mean deviation of the scale values of the 
statements endorsed was calculated. When this inconsistency 
measure was correlated with the subjects’ scores the coefficient was 
—.25 + .11. Since most of the subjects were pacifist this would 
suggest that the more extreme the attitude the less the inconsistency. 
However, in addition to this coefficient being low there is also the 
statistical possibility that those subjects who endorse many and 


conflicting statements must inevitably obtain a neutral score as their 
median value. 
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When inconsistency and decision time were correlated a negligible 
coefficient of +.06 + .12 was obtained. Inconsistency and deflection 
yielded a coefficient of +.30 + .11; showing some tendency for the 
inconsistent subjects to be more variable in their galvanic response. 

From these results considered from the standpoint of the subject’s 
scores we find those subjects with more extreme attitudes regarding 
the general question of peace and war have a shorter decision time, 
are somewhat more consistent in their acceptance of statements, have 
a somewhat smaller but more variable galvanic response, than those 
who are shown by the scale to be more neutral in their attitudes. 

The data from this study indicate that attitudes are highly personal 
and individual. This is shown by the high individual variability which 
obviates the possibility of obtaining reliable differences between the 
statements in regard to the galvanic skin response. If the statements 
had the same meaning for all subjects of relatively the same score 
they would respond in relatively the same manner and some significant 
relationship would be present. On the whole the subjects are not very 
definite or consistent with regard to the general question of peace and 
war. It would appear that while a subject may be said, on the basis 
of his score, to have a general attitude toward the question of peace 
and war, his judgment is more characteristically a matter of the 
acceptance or rejection of particular statements of opinion that are 
related to the topic. The emotional component, as indicated by the 
galvanic skin response, does not appear to be closely related to the 
attitude as such. It appears to be rather a correlate of the decision 
. with regard to the statements. Where indecision and conflict are 
indicated is where emotional responses occur. When the subject’s 
attitude leaves little opportunity for question there appears to be the 
least emotional response. 

It is possible that aside from the variability which is due to the 
personal nature of attitudes, further variability results from the 
galvanic skin response. It also is a highly personal and individualistic 
response which is sensitive to such factors as sensory stimulation, 
ideation, bodily movement, and physiological changes as well as to 
purely emotional responses. Hence, although a very delicate indi- 
cator of disturbance, it may be too variable to employ as a measuring 
device for a single variable. However, the fact that the galvanic skin 
response is so highly correlated with decision time, which is likewise 
an individual matter, would indicate that the variability is in large 
part a function of the attitude, or of the measurement thereof. 
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The measurement of attitudes may be criticized from two stand- 
points. One may question the validity of the method of measurement, 
or one may question the possibility of obtaining a reliable measurement 
of a function which is so variable, vague, and ill-defined as an attitude. 
Much of such criticism must deal with the rationale upon which the 
scaling method has been developed. Since Thurstone has dealt 
adequately with this there is no need to repeat it here. The other 
matter, however, concerns the nature of the attitude itself. Are there 
definite, general attitudes? When we consider the results of this and 
other studies it is difficult to say that there are. Each statement of 
the attitude scale appears to be an item in itself which although related 
to the other statements is a matter of particular consideration in itself. 
Many subjects whose scale values on the test would indicate a fairly 
definite general attitude endorse a wide range of statements which 
vary greatly in scale value and which may even be contradictory. On 
these grounds, Traxler’ concludes that the validity of a scale which 
permits such a wide range of endorsation may be questioned. This 
condition no doubt makes the reliability of the scale questionable, 
but we scarcely know enough about the nature of attitudes as such 
seriously to question the validity of the method. Is not this variability 
an inherent characteristic of the attitude itself? This study would 
indicate that anything in the nature of crystallized, inflexible attitudes, 
far from being the normal occurrence, are very infrequent. Indeed, 
that which characterizes the clinical picture of such an abnormal 
condition as paranoia is a high consistency and inflexibility of attitude 
and opinion. The more normal condition would appear to be the 
presence of a very vaguely organized general attitude, so that one may 
be classed as pacifistic or militarist, but which permits of a wide range 
of acceptance or rejection of particular opinions. In this regard the 
attitude scale may be said to present a valid picture of attitudes, but 
probably it is of relatively low reliability in so far as the consistency of 
the measurement is concerned. 

The general belief that attitudes are highly emotional is not borne 
out by this study. The emotional disturbance appears to be rather 
a function of conflict or the difficulty of making a decision. Where 
& person knows fairly definitely what his opinion is there is little 
indication of emotional disturbance. 





Traxler, A. E.: “Evaluation of Scores of High-School Pupils on the Droba- 


pee ag Attitude-Toward-War Scale.” J. Ed. Psy., Vol. XXVI, 1935, pp. 
6-622. 
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HAPPINESS AS RELATED TO PROBLEMS AND 
INTERESTS 


PERCIVAL M. SYMONDS 
Teachers College, Columbia University 


The well-adjusted individual is one who, among other things, 
is pleasantly satisfied with life in its various aspects, or, to state it 
negatively, is not depressed, worried, irritated by people and cir- 
cumstances.' Since adjustment is an avowed aim of education, 
the more that is known of the factors related to adjustment, the 
more control can be exercised to bring good adjustment about in 
children. This paper explores some of the relationships between 
happiness and the problems and interests which people have. 

The procedure in brief was to secure rankings from high-school 
students, college students, and graduate students of education as to 
their problems and interests in fifteen life areas which may be stated 
succinctly as: Health, sex, safety, money, mental hygiene, study 
habits, recreation, personal and moral qualities, family relationships, 
manners, personal attractiveness, daily schedule, civic interests, 
getting along with other people, and philosophy of life.? 

At the time that the data were gathered concerning problems and 
interests, certain facts were collected (anonymously) regarding each 
individual on a special personal data sheet. This personal data sheet 
contained the following brief rating scale of happiness: 


Check one of the following groups of adjectives which best describes you. 
_____full of deep joy, excitedly happy, enthusiastic, thrilled. 





1 There are other meanings of adjustment. One is also said to be well-adjusted 
if he is acceptable to others. In still another sense one is well-adjusted who has 
learned independence and skill in taking care of himself and is efficient in meeting 
the exigencies of the natural and social environment. 

2 The details of the collection of the data have been fully described in the 
following references: 

Symonds, P. M.: “Life Problems and Interests of Adolescents.’’ School 
Review, Vol. XLIV, Sept., 1936, pp. 506-518. 
: “The Problems and Interests of Older Adolescents” in Growth and 
Development; the Basis for Educational Programs. Progressive Education Associa- 
tion, 1936, pp. 74-104. 
: “‘Life Problems and Interests of Adults.” Teachers College Record, 
Vol. XXXVIII, Nov., 1936, pp. 144-151. 
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___cheerful, successful, optimistic, lighthearted. 
____satisfied, comfortable, life goes smoothly, peaceful. 


contented at times and at other times discontented, life has both 
favorable and unfavorable features. 


____restless, impatient, uncertain, dull, cross, confined. 

______ anxious, irritated, discouraged, disappointed, discontented. 

__—gloomy, miserable, a failure, no pleasure in anything. 

The lines were given a value of 7, 6, 5, etc. to facilitate statistical 
treatment. Table I shows the number of cases and distribution of 
responses. Lines 6, 7, also lines 1, 2 and 3 have been grouped together 
so as to provide sufficient numbers in each group. As was found by 
Watson! and Sailer? the majority of the checks fall above the neutral 
point of the scale. Most people prefer to think of themselves as 
relatively more happy than unhappy. 

Average ranks for each of the fifteen items were computed for 
individuals making each happiness rating separately by type of school. 
Differences were computed between those expressing happiness 
1-2-3 and 5, 1-2-3 and 6-7, and 5 and 6-7. 


Taste [.—DistripvtTion or Responses TO Happiness Scate* 














Line of scale , 
disitead High school College Graduate school Total 
— 236 96 41 373 
5 219 82 21 322 
4 407 394 117 918 
1, 2, 3 25 12 1 38 
887 584 180 1651 

















* Grover Cleveland High School of New York City (through the courtesy of 
Charles A. Tonser, Principal) and the Junior and Senior High Schools of Tulsa, 
Oklahoma (through the courtesy of J. T. Wade, Principal of the Cleveland Junior 
High School) contributed to the high-school distributions. 

Kansas State Teachers College at Emporia (through the courtesy of Dale X. 
Zeller), Purdue University (through the courtesy of H. H. Remmers) and New 
College, Teachers College (through the courtesy of Agnes Snyder) supplied data 
on the college level. 

A section of Education 200F, Teachers College (through the courtesy of 
W. H. Kilpatrick, chairman) furnished the records on the graduate level. 





* Watson, G. B.: ‘Happiness among Adult Students of Education.” Journal 
of Educational Psychology, Vol. XXI, 1930, pp. 79-109. 

*Sailer, R. C.: Happiness Self-estimates of Young Men. Teachers College 
Contributions to Education No. 467, Teachers College, Columbia University, 1931. 
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These differences which are more than twice the standard error 
of the difference are given in the accompanying table II. 
The outstanding fact from these data is that happy and unhappy 
have remarkably the same problems and interests. Out of one 
hundred thirty-five possible differences only twenty-four showed 


TaBLe I].—RewisLE DIFFERENCES BETWEEN Happy AND UNHAPPY IN RANKING 
Firtzen Items FoR PROBLEMS AND INTERESTS 










































High-school Students 
Happy Happy | Average 
minus minus minus 
average | unhappy | unhappy 
Problems. 
RR MN ne hig be ach! pala dew des alain ee —1.13 
PALE fa) ay aed ieee bas — .90 —2.24 
NS 5 soi hws caceb ees ween bebe on +1.70 
hain. biden bes cunan elewe +1.14 
Family relationships..................... —1.12 
a ed i, Ls dnt a gael + .7 
emus ebtmnetivemane. «ei ht olbeee OR ccwees +1.66 
es o's tuk yoo baw «oud eo + .81 —1.90 —2.05 
UI SUTIN ans a oe cb oeicis sce cns dante 
Interests. 
Personal attractiveness.................. ai. a were +1.94 
ie a Ss Sen's Cae sR ON Wee ans —1.70 
College Students 
Problems. 
svi sos edn co tb.s bee snveswe —1.70 —2.30 
Se oso a cae cece eu. +1.27 
Getting along with others................ —1.62 
Interests. 
ao gas Cs avin s eb Sos os sve Gt Capen eae —2.63 
SORES ie AED ee Ena +3.93 +3. 60 
OS os GCN sce w'ce cokes 05 KET Sunes +3.85 +3.55 
Graduate Students 
Problems. 
RE Pe CSE ET Pee —2.72 
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differences which were greater than two times their standard error. 
The rank order correlation between the order of the ranks for problems 
of the most happy and least happy for the high-school group was 
.513; for the college group .382; and for the adult group .687. 

An inspection of the differences reveals several important rela- 
tionships. The data verify the age-old dictum of philosophy that 
happiness is not found by seeking for it directly. From our data 
it may be seen that the happy find mental hygiene to be a problem 
less than do the unhappy. On the other hand the happy adults take 
their civic obligations more seriously than unhappy adults, while 
unhappy college students who do not yet have civic obligations find 
civic problems much less interesting than do the happy students. 
On the high-school level the happy find their study habits more of a 
problem than do the unhappy, while on the other hand the unhappy 
are more concerned with mental hygiene and family relationships. 
In the college where emancipation from the family has been accom- 
plished to some degree the happy find getting along with other people 
less of a problem. The evidence is clear that the happy are most 
concerned with facing reality, while the unhappy are most concerned 
with their own unhappiness and their intimate relations with others. 

This same distinction applies also in the realm of sex. On the 
high-school level the happy find sex less of a problem and the unhappy 
find it more of a problem. On the other hand the unhappy (malad- 
justed) are less concerned with personal attractiveness than the happy. 
The happy express sex openly in their personal relations and in making 
themselves attractive to others; the unhappy are more directly con- 
cerned with sex itself. 

Another interesting difference concerns attitude toward philosophy 
of life. To be sure philosophy is more of an interest than a problem 
at all levels and for all groups. But the happy find it much more 
of an interest than a problem, whereas the unhappy find philosophy of 
life more of a problem and less of an interest. The happy like to 
speculate about things; the unhappy tend to be concerned about goals, 
ambitions, ideals, etc. more intensively. 


CONCLUSION 


Happy and unhappy are remarkably alike in their problems and 
interests. The unhappy do not have peculiar problems but make 
less satisfactory adjustments to their problems. 
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The happy are more concerned with affairs outside themselves— 
the unhappy are more concerned with themselves and with their 
relations to others. 

In adolescence with regard to sex the happy are interested in 
making themselves attractive for successful social relationships; 
the unhappy are more directly concerned with sex. — 

The happy tend to find philosophy of life (ideals, ambitions, 
religion) more of an interest and less of a personal problem than the 
unhappy. 
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STUDY HABITS IN GRADES FOUR TO TWELVE* 


NOEL B. CUFF 
Eastern Kentucky State Teachers College 


BACKGROUND AND PROBLEMS OF THIS INVESTIGATION 


For several years a lively interest has been manifested among 
school people and psychologists with reference to habits of study and 
their relation to success in school work. The issues involved are 
obviously of great import for a psychology of learning, since they are 
part and parcel of the problems relative to economy in and guidance of 
learning. 

Naturally the early prolixity of rules for study grew largely out 
of armchair speculations and rightly or wrongly often lacked pertinent 
scientific support. In other words, McMurry, Whipple, and others 
developed advice from their meditations which they thought would 
cure, if universally adopted, all existing study diseases—a situation 
analogous to that of the physician who prescribed carbonated water 
when it became commercially available for all his patients, regardless 
of the disease, because he thought it ought to be good for them. More 
recently however some fundamental problems concerning study 
techniques have risen from the level of mere rhetorical exposition to 
the status of experimental observations. 

Research studies by Ross, Eurich, Pressey, Wrenn, and about a 
thousand others reveal several procedures for investigating methods 
of learning. Unfortunately, however, many of their generalizations 
relative to efficiency in learning have been derived from reactions of 
less than three hundred university students or from reactions of small 
groups of selected rats. Although the principles of learning derived 
from the investigations on college students and laboratory animals 
are of great interest and importance, they do not necessarily reveal the 
somewhat elusive study habits of pupils in grades four to twelve. 

Attempts, therefore, to discover the habits and conditions which 
may determine success in grades four to twelve seem to be necessary. 
Hence, in this study, the following problems have been attacked: 
(1) What are the good and bad study activities frequently engaged 
in by pupils in grades four to twelve? (2) How do the methods of 





* Preliminary report read at the Psychology Section of the Ninety-seventh 
Meeting of the American Association for the Advancement of Science, January 1, 
1936, St. Louis, Mo. 
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study followed by pupils of superior achievement differ from those 
followed by pupils of inferior achievement? (3) How do study habits 
of bright pupils differ from those of dull pupils? (4) How do the 
methods of study of the youngest and the oldest pupils in given groups 
differ? (5) How does the general quality of study habits (total score) 
compare for various grades? (6) What are the most important 
study habits in order of merit as determined by different criteria? 


A series of how-to-study meetings initiated by Supt. Kirkpatrick 
and the Paris (Ky.) City Teachers Association was the inception of 
this research undertaking and led to a survey of over five hundred 
relevant contributions in an effort to determine the principles of study 
generally agreed upon as important. 
twenty-five statements about study habits most often cited were 
compiled. The ten commandments, in rank order from this total, 
agreed upon by from forty-two to ninety-four per cent of the studies 
as important prerequisites for effective work are shown in Table I: 


Taste I.—Tue Ten Rouues ror Stupy Most Frequentiy Lisrep IN Five 


The Journal of Educational Psychology 


METHODS AND RESULTS OF THIS INVESTIGATION 


HunprepD CONTRIBUTIONS 


In this way the one hundred 





Per cent of studies 





Rank mule emphasizing rule 
1 | Have a definite time for the study of specific lessons. . . . 94 
SEES OPE ET LOT LCT EE OS RIC y Tot 83 
DF II BE BROOIIIB oo oon soc cic ce ccc vccewc vec vcwns 72 
eo vin sale oan neh egebe nobiegteka’ 66 
5 | Skim over material before reading it in detail......... 55 
6 | Work out individual examples to illustrate general rules 

SE Cio cshe skc ss ocGwdbisev ds dvedewneh 52 

7 | Seek a favorable environment for study.............. 50 

8 | Have a clear notion of the task before beginning...... 48 
9 | Review previous work before beginning an advanced 

as eas 4. 6k 4h 0 dade 6da ap aeea ues 47 

10 | Recite silently immediately after reading a lesson... ... 42 











A casual examination of the ten excerpts given in Table I suggests 
that they have been tinged by research at the college level; that only 
six rules are considered important in over fifty per cent of the five hun- 
dred studies; and that they may not be the golden rules for elementary- 
school children. For example, the rule relative to taking notes on 
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lectures is probably more important for university than for elementary 
subjects. Hence, it also follows that judgment upon the importance 
of study rules for public schools should be suspended until the bounds 
of our knowledge have been in some measure enlarged. 

In hope of securing data that would be at least a slight contribution 
to our knowledge of study activities in elementary and secondary 
grades, a question list containing the seventy-five most commonly 
advocated items was prepared and given to about one thousand chil- 
dren in the Paris City Schools and to approximately two hundred fifty 
children in the Training Schools of Eastern Kentucky State Teachers 
College. The percentages of outstanding pupils (7.e., above Q; in 
achievement, in test-intelligence,* and in chronological age) answering 
questions yes (or no) were computed. The percentage of such pupils 
answering each question in a given way in excess of the percentage of 
poor pupils (7.e., below Q, in chronological age, achievement, and 
test-intelligence) was also found. In other words, the distinctive 
study techniques of the efficient, bright, and under-aged pupils were 
considered more potent than the methods characteristic of opposite 
groups. One cannot conclude, however, that all the methods of the 
inferior groups should be avoided, because about seventy per cent of 
these individuals ‘‘usually look up in a dictionary new words.”” Fur- 
thermore, one must avoid the fallacy of assuming that all common 
habits of the superior groups are desirable, since less than half of the 
pupils composing these groups ‘“‘recite silently immediately after 
reading a lesson.”” They might, of course, be better students if they 
used this suggestion. In addition, the differences between the groups 
on such items as using a dictionary are insignificant and statistically 
unreliable. Nevertheless, there are fifty-four differentially significant 
questions (the statistical analyses are indicated in Table III). 

The ten study items that rank highest in the battery of fifty-four 
are shown in Table II. It is evident that the study rule most differen- 
tially significant for public-school children is: ‘‘Have a clear notion 
of the task before beginning the work of a particular study period” — 
the survey of literature cited above ranks this item eighth. The 
second statement in order of merit in this study—‘‘make complete 
Sentences while writing’—is not in the first ten as determined by 
pooling previous findings. Hence, the formulated methods of work 


which are considered very significant for university students may not 
be valuable at lower levels. 





* Henmon-Nelson and National IQ’s for training schools only. 
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Tasize I].—Tue Ten Most Sieniricant Mersops or Stupy CHARACTERISTIC 
OF HIGH-SCHOLARSHIP, MOST-INTELLIGENT, AND UNDER-AGED GROUPS 
More OFfTEeNn THAN OF Opposire Groups IN TWELVE Hunprep Firry 
Purirts or Grapes Four To TwELvE 








Answer by 
P good more 
\ Composite Question often than 
(= by poor 
students 
1 Have you a clear notion of the task before beginning 
the work of a particular study period?............. Yes 
2 Do you make complete sentences while writing?..... . Yes 
3 Do you seek to master all the material as progress is 
made from lesson to lesson?...................+-. Yes 
4 Do you grasp the meaning of a chart or table without 
SS WEN Was ba aad \ vtheaise oceeebe cee bass Yes 
5 Do you try to interrupt work at a natural break in the 
printed material, such as at the end of a chapter?.... Yes 
6 Do you take notes while reading or studying?........ No 
7 Do you work out individual examples to illustrate 
general rules and principles?...................... Yes 
8 Do you provide yourself with materials required?. .. . Yes 
9 Do you use facts learned in one class to aid in prepar- 
ing for another?................... so iee Megmhy con Yes 
10 Do you read each topic in a lesson separately until it is 
Eo. 5 in in a's 0 oS ae aan e hp ote Yes 











Table III presents the extent to which the (E.K.S.T.C.) Training 
School pupils practice two of the study rubrics listed in Table IL 
The data for the second question for example clearly show that eighty- 
nine per cent of the under-aged pupils claim to ‘‘make complete 
sentences while writing.’”’ The similar pictures for the brightest and 
the highest-achievement groups are respectively eighty-two and 
forty-four per cent. Moreover, the differential percentages of 21, .01, 
and 15 agree noticeably well in showing that pupils in the upper 
quartiles follow this rule more frequently than do the pupils in the 
lower quartiles. They suggest too that the item discriminates most 
sharply between the youngest and oldest (21 per cent); next between 
the brightest and dullest (15 per cent); and least between the best and 
poorest achievers (.01 per cent). It will be noted too that pupils 1 
grades ten to twelve “‘make complete sentences’”’ more often (80 per 
cent) than do individuals in grades four to six (20 per cent). The 
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results for Question ten show however that senior high pupils (— 07 per 
cent) study topics separately less frequently than do fourth- to sixth- 
grade pupils (27 per cent). Consequently, it is evident that the 
differential values of study integers at secondary levels (grades ten 


to twelve) may not indicate their significance at elementary levels 
(grades four to nine). 


Taste II].—Percentaces or Pupits Practicing Two Stupy RuLEs AND THE 
DIFFERENCES IN TERMS OF PERCENTAGES FOR Opposire Groups 
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A comparison of the results cited and of data not presented here 
also shows that the percentage of successful pupils answering a question 
in excess of the percentage of failing pupils does not indicate how defi- 
nitely the item discriminates between the bright and dull or between 
the youngest and oldest cases—fifty correlations computed to deter- 
mine more exactly the relationship between the different criteria 
range from .02 to .90 with a median of .64. Furthermore, these 
correlations probably justify the observation that the extent to which 
the youngest pupils have a study habit in excess of the oldest is the 
best index of its discriminative value. In fact, the results for this 
hitherto neglected criterion lead us to quote Terman’s conclusion that 
“it is better to consult the birth records in the class register than 
to ask the teacher’s opinion.” 

The percentiles presented in Table IV yield some striking con- 
clusions. For example, the median scores for grades four (median 39) 
to twelve (median 31) show neither progressive nor reliable differences. 
This suggests that study habits are formed early as a result of trial and 
error or of other subtle, selective, and fixative factors and that there- 
after the vectors tend to remain constant unless effective remedial 
programs planned by alert teachers result in changes. The compari- 
sons of medians for the pupils in the high and low achievement groups 
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also reveal that in every grade the superior pupils have better study 
habits than the failing pupils. The averages for the composite groups 
are respectively thirty-five and thirty. Furthermore, in any given 
grade some individuals have two or three times more good study 
methods than do the worst cases; one student (P99) per one hundred 
in grade four has forty-five good methods while another (P;) has 
thirteen. 


Taste [V.—InventToRY or Firty Stupy-Hasits or TwEeL_veE Hunprep Firry 
Purits In GrRapDEs Four to TWELVE 











Grades 
Percentiles 

4;/5/;6;]7;] 81] 9 110/11] 12] Ave. 
99 45 | 45 | 45 | 44 | 48 | 44 | 44 | 43 | 44] 44 
75 42 | 37 | 41 | 35 | 36 | 38 | 36 | 33 | 35 | 36 

50 
Low scholastic standing . .| 35 | 33 | 35 | 29 | 33 | 30 | 30 | 29 | 29 30 
High scholastic standing. .| 40 | 35 | 42 | 33 | 36 | 40 | 35 | 31 | 34] 35 
Total scholastic standing. .| 39 | 33 | 38 | 32 | 33 | 34 | 32 | 31 | 31 | 33 
25 . 35 | 28 | 35 | 28 | 29 | 31 | 30 | 29 | 29 29 
1 13 | 18 | 23 | 17 | 14 |-20 | 18 | 19 | 19] 18 



































SUMMARY OF FINDINGS RELATIVE TO STUDY HABITS 


The techniques used in this study provide data, notwithstanding 
the complexities and inherent difficulties of the problems, from which 
the following partial interpretative conclusions can be drawn: 

1. We cannot say categorically that a study habit is either good 
or bad. For example, the rule to take notes on lectures ran third 
theoretically and fails to rank experimentally in the first ten rules. 

2. Only two of the ten most important study rules for grades 4 
to 12 in order of merit appear in the ten which rank highest in the list 
prepared from studies frequently based on college subjects. 

3. A given rule may be followed more exactly by inferior than by 
superior groups and yet not be a faulty habit. For example, six per 
cent more of the pupils below Q,; ‘“‘keep a list of unfamiliar words” 
than do those in the highest groups. 

4. Several criteria—including the differences between the reac- 
tions of the youngest and the oldest in given grades; the best criterion 
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used in this study—should be used in evaluating study techniques. 
To illustrate, the item which ranked first according to this index 
ranked respectively third and fourth based on two other standards. 

5. A large number of cases should be used in making an inventory 
of desirable study habits. A preliminary analysis of two hundred fifty 
cases yielded a list that contained only sixty-four per cent of the items 
in the final form based on twelve hundred fifty cases. 

6. Bright, young, and superior pupils in every grade have quanti- 
tatively (17 per cent) more helpful study habits than do others. 

7. There are wide variances in the study methods of pupils in any 
given grade—some individuals have about three times more good 
habits than do others; the range is from fourteen to forty-eight for 
grade eight. 

8. The common a priori assumptions of teachers and parents 
that many pupils do not know how to study are supported by this 
investigation. Over one-third of the study habits of half the pupils 
are defective. 

9. Methods of study apparently crystallize in the elementary 
grades and do not as a rule improve appreciably thereafter, as indicated 
by the facts (1) that fifth-grade pupils have thirty-three good study 
habits and (2) that grades four to twelve inclusive have an average of 
thirty-three. 

10. A carefully derived study-habits inventory aids in finding the 
pupils in need of special guidance (twenty-five per cent of the cases 
have twenty-nine or less desirable study techniques out of a possible 
fifty) and helps identify for remedial work the good and bad study 
habits of individual cases. 
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A NOTE ON SCORING THE REARRANGEMENT TEST 


VERNER M. SIMS 
University of Alabama 


In the April, 1934, issue of Tue JourRNAL oF EpucaTIONAL Psy- 
CHOLOGY (pp. 251-257), we presented a simple and practical formula 
for scoring the rearrangement test. This formula was based upon the 
following assumptions: (1) That there should be a straight-line, but 
inverse, relation between the score and the sum of the deviations from 
the true order; (2) that a perfect score should be equal to the number 
of items in the set, and (3) that the worst possible arrangement should 
receive a zero score. This formula was 


S=n-— (1) 


where n is the number of items in the set and d is the arithmetical sum 
(without regard to sign) of the student’s deviations from the true 
order. The formula fulfilled our requirements exactly when the set 
was made up of an even number of items. When the number of 
items was odd there was an error which was negligible when scores 
were expressed as whole numbers. (See the original article.) 

Conrad* has recently called attention to the fact that adequate 
scoring of the rearrangement test should make a correction for chance 
or guessing. He presents a formula for finding the mean chance 
arrangement, and then develops a scoring system which assumes 
that a maximum score should receive a score equal to the number of 
items in the set, and the mean chance arrangement should receive a 
zero score; but, because the number of possible arrangements increases 
rapidly as one approaches the value of the mean chance arrangement, 
instead of assuming a straight-line relationship between deviation 
from the true order and score, he assumes a curvilinear relation, 
and more or less arbitrarily gives added weight to the smaller devia- 
tions. There is some justification for his assumption, but the crude- 
ness of the measurement will not justify such great refinement; and, 
what is a more serious criticism of his method of scoring, it is not 
practical because it demands that you have his table before you when 
using the rearrangement test. 

If you make the first two assumptions that were made in our 
original formula (1) and change the third assumption to ‘‘the mean 


* Journal of Educ. Psy., April, 1936, pp. 241-252. 
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chance arrangement should receive a zero score,” it is possible to 
get a very simple and usable formula which will make provision for 
guessing. 
Conrad’s formula for finding the mean chance deviation was 
Zt\d| 
M q¢= Ter (2) 
where “‘d is the difference (without regard to sign) between the correct 
numbering and the possible. [The same use as in formula (1) above.] 
t is the number of sets in which each particular, given difference 
appears.” And, n is the number of items in the set. 
But the mean chance deviation may be more simply expressed by 
the formula 
xd 


ms Number of possible combinations. (3) 





As Conrad points out, the number of possible combinations equals 
n!, and it can be shown that for a given length set 








- * 
Sd = ae 1) (4) 
Consequently, 
n?—1 
M. =". (5) 
The formula 
S=M,.-—d (6) 


assumes a straight-line relation between deviation and score, but the 
scores vary from M, to zero, rather than from n to zero. However, 











sang 
it may be transformed by substituting = 3 ; for M, and dividing by 
n?— 1 
the ratio 
Simplified, this becomes 
3dn 





* The writer discovered this relationship and validated it by constructing from 
Conrad’s findings a table of progressions involving n, mean deviations, and number 
of combinations, from these values determining a progression of Zd, and then 
relating the progressions (n?— 1) to these values. 
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: This formula conforms to our assumptions but it is rather too com- 
fh plicated for practical use. Inspection shows that the last term is 
oe th: 2 2 

beh the product of se and —— i" But, aaa isa ratio which approaches 


unity as n is increased, and is always a negligible factor if students’ 
scores are to be expressed as whole numbers. Consequently, for 
practical use we recommend 


S=n- = (8) ele 


‘ For most economical use, one should first find Mz, for the set being tio 
ol scored (formula (5) above), substitute the even numbered values evi 
between zero and Mz, in formula (8), then prepare a simple table bu 
showing correspondence between deviations and score, all deviations we 
4 greater than M, being treated as chance ones and scored zero. Scoring kn 

i then involves finding the sum of the student’s deviations from the ph 

key and reading the score from the table. po 
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A NOTE ON COLOR 


RICHARD G. CLARKE 
Allegheny College 


All visual images are the interpretation given by the organism to 
the electrical and chemical changes in the brain tissue caused by 
electrical impulse originating in an oxidation of photo-sensitive 
molecules in the retina of the eye by radiant energy of a definite vibra- 
tion frequency. In the higher animals the retina is developed by 
evolutionary process until it can distinguish not only light and dark, 
but also frequency, interpreting the varying frequencies by a process 
we might term “‘color discernment.” This much is evident and is well 
known to psychologists. I wish to develop the physico-chemical 
phase of the question in a more satisfactory manner than has been 
popularly presented. 

First, let us investigate the nature of light itself. The most recent 
and best substantiated theories do not consider electro-magnetic 
impulse giving rise to light when a retina is affected as ‘‘a wave motion 
in an imponderable ether” as the physicists of 1900 thought, but as a 
series of material particles of extremely tiny mass traveling in a 
vibratory path at a speed of two hundred ninety-nine thousand kilo- 
meters per second through the interstices in the rest of the matter in 
the universe. Electro-magnetic energy is material in the same sense 
that atomic structures are called material—made of the same “stuff” 
only in a different aspect. Different impulses differ only in the 
number of particles they contain, the energy and vibration frequency 
varying with the number of particles, or ‘“‘fundamental quanta” 
in the impulse. 

This high speed group of particles or ‘“‘ photon” flies through matter 
until it enters an atom and strikes an electron within that atom. If 
the atom’s electron is receptive to this particular photon it is absorbed 
and the contained energy is taken up by the electron. This may, 
under proper conditions, cause the electron to be repulsed from 
the atom, leaving an “oxidized” atom or ion. The ion immedi- 
ately tries to get an electron back so that it may be reduced to an 
atom. This is the principle of the quantum theory applied to photo- 


electric effect, with which we are concerned in color and visual 
Sensation. 
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Suppose a molecule in the retina of the eye has an atom whose 
electron is selectively tuned to a photon of wave length of .400 microns, 
When such a photon impinges upon this electron, the electron absorbs 
the energy and leaves the atom and molecule. This is sufficient, 
in an unstable molecule, to break up the whole thing, giving rise to a 
considerable disturbance in the surrounding matter. In the eye 
this disturbance is picked up by neural fibers as an electrical impulse. 
This sets off a chemical reaction in the nerve fiber, causing a neural 
impulse to be sent to the brain, where, in the occipital lobe, certain 
cells undergo chemical change to which they are accustomed, and 
from habit the owner of the cells says, ‘‘I see a deep violet light.” 

From all knowledge of photo-chemical processes the different 
structures of the retina are selective, 7.e., each one picks up and 
absorbs light of a particular wave length and no other. 

When no stimulus affects the eye the result of no impulse to the 
brain is known as “black.” This can be proved by several means: 
When the eye is in perfect absence of light the sensation is black. 
An after image of black is not white—it is colorless. A retina examined 
chemically when it has not been exposed to any light impulse for a 
considerable length of time shows no sign of decomposition due to 
light effects. 

There is good evidence for believing all visual sensations save the 
negative after image to be katabolic; as a rule protoplasm is broken 
down in the body by oxidation. Photo-electric effects result in oxida- 
tion. Thus the light sensitive molecules being broken down by 
photo effects would seem to be a katabolic process. Shock by a blow 
on the eye or an electrical discharge into the eye results in a brilliant 
light. The blow breaks up the delicate photo-sensitive molecules 
of the retina (one can hardly visualize a chemical change going to an 
unstable product caused by a violent stimulus. Changes to such 
products might be termed “subtle”). Remember the light seen at 4 
blow on the eye has an after image caused by the restoration of the 
destroyed molecules. Other evidence could, I think, be found by 
experiment along this line. 

The after image is caused by the anabolic process of restoring the 
parts of the retina broken down by the photo-stimulus. The evidence 
for this is that colored after images are complements of the stimulus. 
As anabolism is the reverse of katabolism, the effect on the brain would 
be reversed. So it is—the visual after image color is the compleme?- 
tary color of the impulse. 
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To summarize: Color sensation is due to distinction of the observer 
of different photons varying in energy. This is caused by selective 
photochemical action of retinal molecules. Black is not an affector 
of retinal molecules—it is a lack of sensation, not a sensation. Retinal 
response to external stimuli is katabolic. After images are anabolic 
processes restoring the destroyed retinal molecules. 
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THE EFFECT OF CHANGED RESPONSES IN 
TRUE-FALSE TESTS 


GEORGE E. HILL 
University of Pennsylvania 


Many teachers have been advising their students to give preference 
to their second judgments on true-false tests if these judgments differ 
from the initial impression. This advice has been based in part on 
studies tending to show that second judgments are the more often 
correct. Lamson! found that the ratio of correct to incorrect changes 
when second judgments were given preference was about two to one. 
In her investigation, however, the students were asked to identify 
any changes in answers and were thus placed in the position of being 
very conscious of all such changes. 

Mathews”? results were much the same; but his students marked 
the true-false items with plus and minus signs, making it difficult to 
identify all changes. Lowe and Crawford,’ from an analysis of answers 
made under normal testing conditions, report that two-thirds of the 
changed answers were changed correctly. 

The tests analyzed in the present study were given under ordinary 
testing conditions in three different Education courses: Character 
Education, Methods of Teaching in High Schools, and Extra-curricular 
Activities. Nothing was said to the students about changing their 
responses. They had plenty of time and were told to check over their 
papers when finished. In no case did any student have insufficient 
time for this purpose, although not all of them so made use of the time. 
The students had written instructions not to check an item unless 
reasonably sure about it, the R-W marking method being used. All 
students were required to use pencil and to check the items plus or 
zero. There were very few cases in which a changed answer was 
difficult to identify. All papers were analyzed by the same person, 
the writer. 

Eleven different true-false tests taken by five hundred two third- 
and fourth-year college students were analyzed. The tests ranged in 





1 Lamson, E. E.: ‘‘What Happens When the Second Judgment is Recorded in 4 
True-false Test?”’ Jour. Ed. Psy., Vol. XXVI, March, 1935, pp. 223-227. 
? Mathews, C. O.: “Erroneous Impressions on Objective Tests.’’ Jour. Ed. 
Psy., Vol. XX, April, 1929, pp. 280-286. 
3 Lowe, M. L., and Crawford, C. C.: ‘‘ First Impressions Versus Second Thought 
in True-false Tests.” Jour. Ed. Psy., Vol. XX, March, 1929, pp. 192-195. 
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number of items from fifty to one hundred, most of them being seventy- 
five items im length. In all, there were 33,329 answers analyzed, 
50.5 per cent true and 49.5 per cent false. Of these seven hundred 
eighty-six were blanks or omissions, this being due to the directions 
issued and the method of marking, which was explained to the students. 
That there was no guessing on the test is doubtful, however. 

A general summary of the 33,329 answers follows: 








Number Per cent 
ks ibces 61 os inde soee Clee dacdsdde 786 2.05 
Items correctly amswered...............2.cceeeee0. 27 , 742 83.55 
Items incorrectly answered.................-ee00: 4,801 14.40 
Items changed to correct answer..... eeKpog mabee uss 491 1.47 
Items changed to incorrect answer................. 343 1.03 
Tobel Tem ak SS ie Ga eh ve 834 2.50 











A more detailed analysis of the changed answers is given below: 


Ratio of correct to incorrect changes................... 1.43 tol 
Per cent of changed answers correct................... 58.87 
Per cent of changed answers incorrect.................. 41.13 
Per cent of changes made from true to false............. 53.72 
Per cent of changes made from false to true............. 46.28 
Per cent of correct changes made from true to false...... 60.47 
Per cent of incorrect changes made from true to false.... 39.53 
Per cent of correct changes made from false to true...... 56.99 
Per cent of incorrect changes made from false to true... .. 43.01 


Per cent of all correct changes that were from true to false 55.19 
Per cent of all correct changes that were from false to true 44.81 
Per cent of all incorrect changes that were from truetofalse 51.60 
Per cent of all incorrect changes that were from falsetotrue 48.40 


It would appear that there is a slightly better chance of being right 
than wrong if one changes his response to an item when his second 
judgment differs from his first impression. This study does not, how- 
ever, reveal that the chances of being right are as great as suggested 
by the three studies already quoted. It would seem that the chances 
of making a correct change are better when the change is from true 
to false than when it is from false to true. Not only do the percentages 
listed above support this conclusion; but, also, false items were much 
more often missed than true items. Of the total number of items 
missed (4801) 36.24 per cent were true and 63.76 per cent false. 
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The fact that over eighty-three per cent of all responses were correct 
while fifty-nine per cent of the changed responses were correct would 
suggest that these were the items about which the students were least 
sure. This is borne out by the fact that the product-moment coeff- 
cient of correlation between the number of times items were changed 
and the number of times they were missed was +.3859 + .0075. The 
median percentage of students who missed those items that were not 
changed by anyone was 4.05 per cent; while the median percentage 
of students who missed those items that were changed by one or more 
students was 14.50. 


SUMMARY 


1. Analysis of 33,329 responses in true-false tests given under 
regular classroom conditions shows that relatively few students change 
a recorded answer, 2.5 per cent of all answers being changed. 

2. Changed answers are more apt to be right than wrong but are 
much more often wrong than the answers to items that are not changed. 

3. The subjects of this study benefitted more in changing an answer 
from true to false than from false to true. 

So far as this study indicates, there is not much advantage to be 
gained by changing one’s answers on a true-false test. All the changes 
made by the students in this study raised the average score by less 
than one percent. While the analysis of these responses did not reveal 
specific facts to this end, it would appear that some students change 
their answers more often than others, and poor students more often 
than good students. This was shown by Mathews’ study. 

It appears that, if true-false tests are carefully constructed and if 
students are well prepared to take them, they need bother little about 
changing their answers. If, after once recording an answer, second 
thought suggests it should be changed probably little harm or good will 
be done by doing so. It is doubtful, however, that students should be 
encouraged to make snap judgments and then change their answers 
simply because the ‘‘chances’”’ seem to be in favor of such changes 
being correct. 
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BOOK REVIEWS 


AsraHAM A. Low. Studies in Infant Speech and Thought. Part I. 
University of Illinois Bulletin XX XIII, 1936, No. 39. Illinois 
Medical and Dental Monographs, Vol. 1, No. 2. Urbana, 
Illinois: University of Illinois, pp. 71. 


According to the author “the main purpose of the present study 
was to evolve methods rather than to draw conclusions from the 
elaborated material.” In the opinion of the reviewer, however, 
this monograph sets a very bad example in methodology, and while 
future investigators might find herein stimulating suggestions and 
provocative categories for treating data, it is to be hoped that none 
will follow in this author’s erring footsteps in the main methodological 
features. The investigator admits that his method is crude, and 
that it may not be universally applicable, and that the three main 
features of his method—(1) quantitative approach, (2) longitudinal 
section and (3) the study of sentence development—have all been 
attempted separately in other studies. However, there seems to 
be little of value added to the field of methodology by the careless 
application of several techniques in combination which have been 
better applied by other investigators singly and in combination. 

The monograph exemplifies the sort of scientific naiveté which 
frequently occurs when a worker steps out of his own field and attempts 
to do research in another field. In this instance a neuro-psychiatrist 
associated with a medical school is attempting research in genetic 
psychology. He seems to be familiar chiefly with the German and 
French literature on the topic and with a few of the early American 
biographical studies, as judged from his bibliography of forty-one 
titles of which only ten are American. Over half of the total list 
were published before 1915 and only five since 1930. Later and more 
comprehensive American studies are dismissed as a group because 
they are “based on samplings, not on an analysis of continuous 
records” and because they ‘attempted to devise an all-around measure 
of language development” for which purpose they selected the length of 
sentence. While it is true that the authors thus criticized did use 
this quantitative measure and found it a very good developmental 
index, it should be pointed out that each also conducted one or several 
of the following types of analysis in addition: Sentence structure, 
vocabulary, functional use of sentences, and parts of speech. 
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Data for the present monograph were secured by having three 
first-born children in favored homes observed by the mothers under 
the author’s supervision over a period of twenty-seven months, 
their ages being fourteen, thirty-two, and thirty-six months at the 
beginning of the observations. The responses of the oldest subject 
were discarded, however, because the author considered that she had 
reached ‘‘an age in which linguistic mastery had been definitely 
established.” (Thirty-six to sixty-three months.) In the words of 
the author, ‘‘The material was taken from the daily or almost daily 
verbatim record of the utterances of the children, kept by their 
mothers during a period of two years and three months .. . Billy’s 
record consisted of close to 4500 utterances, of which 3815 were 
analyzed. Of Fred’s record of about 8000 utterances 3190 were 
analyzed.” The reviewer is at a loss to determine how the “daily” 
samples were taken, under what circumstances, and for how long, 
or how many responses were recorded for each child daily. Although 
the author repeatedly emphasizes the continuity of his records as a 
vital aspect of the contribution, selection of responses to be recorded 
must have occurred. Even if the total number of responses for all 
three children (approximately 25,000) were analyzed they would 
average only 10.29 responses per child per day! The term “daily” 
records seems also to be something of an exaggeration, since the 
charts have many gaps where only one observation occurred in the 
month and parts of the charts are based on “‘two or more observations 
in the month.” Yet the author severely criticizes studies that 
admittedly employ planned ‘‘sampling’”’ methods in contrast to his 
“‘continuous”’ records! 

After discarding one of his three cases, the investigator, apparently 
without justification, throws away fifteen per cent of his data on 
one of the remaining cases and sixty per cent of his data on the other 
and devotes the entire monograph to direct comparisons of the two 
cases that differ by eighteen months of chronological age. The 
final analysis therefore is based on only 3.95 responses per day for 
Fred and 4.71 responses per day for Billy. 

The author seems to forget this artificial selection of responses for 
analysis when he comments with surprise on the almost identical 
number of sentences and of clauses accumulated (in the analyzed data) 
by the two children, and when he later uses this point, of similar 
quantity of material obtained from the two children, in arguing 
against the desirability of computation of the mean length of sentence. 
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Further curtai'ment of data also occurred since ‘‘only those sub-groups 
of responses were included that had an incidence of at least ten per 
cent of the total group.” It is thus impossible to get a complete 
composite picture of even the analyzed responses of either child at any 
stage studied. In the analysis of verb forms, those verbs which have 
the same form in both the present and the past tenses were omitted 
from the tabulations! 

The author holds that since “the rate of language development 
varies considerably in different groups of children . . . it is prac- 
tically impossible to compare the degree of maturity . . . of a child 
of two years with that of another child of the same age.” He therefore 
considers two years one month and two years eight months “‘func- 
tionally corresponding months” for the two children since they are 
“functionally” equal distant from the period of ‘‘complete mastery”’ 
which is designated by the occurrence of errors in less than five per 
cent of the sentences analyzed. These ages precede by one month 
for each child, the age at which errors noted in the responses analyzed 
dropped below fifty per cent, and the children were therefore con- 
sidered to be entering the period of “‘ preponderant success” in sentence 
formation. The direct comparison of children of the same chrono- 
logical age he considers a basic defect of all other studies in the 
literature and he thinks that his method of comparison on the basis 
of ‘‘relative distance from the point of final mastery” obviates this 
difficulty. 

The author exhibits considerable originality in his functional 
approach to the problem of child language and in his ability to discard 
time-honored grammatical classifications in favor of psychologically 
more meaningful categories. His invention of terminology is however 
confusing. He uses “equivalent” to mean functionally complete but 
structurally incomplete sentences. Incidentally, ‘‘those equivalents 
which merely represented a ‘naming’ activity were omitted from the 
tabulation!” Other terms which are employed include ‘‘incomplete 
nexus,” “junction,” ‘‘postprepositional articles,” ‘‘postcopulative 
articles,’ ‘‘hypotaxic conjunctions” and many other rarely encoun- 
tered items. 

With the obvious defects in method which have been pointed out 
it seems futile to spend further time and space on a consideration 
of the detailed report of findings on the very incomplete and dubious 
analysis of two cases. DorotTHEA McCartTay. 

Fordham University. 
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J. R. BuTter anp T. F. Kanwosxi. Human Psychology. New York: 
Pitman Publishing Corp., 1936, pp. XVIII + 447. 


In this text for the first course, human psychology is defined 
as ‘“‘the study of man’s reactions to his environment.” As with 
most texts, results of both objective and subjective observation are 
considered. About thirty-one per cent of the book is devoted to 
sensation and perception, seventeen per cent to learning and fifteen 
per cent to native equipment in which is included capacity for intelli- 
gent behavior. The other subjects receive less space. Many will 
welcome the interesting chapter on experimental esthetics. 

Although at times the material gives the impression of over- 
simplification, the authors are to be commended for a clear exposition 
in simple language. The section on sensation deserves special mention 
in this respect. Furthermore, there is a desirable emphasis upon 
function rather than structure in considering sense receptors and the 
nervous system. Throughout the book the applications to behavior 
in everyday life are stressed. 

Like so many text writers, however, the authors neglect the recent 
literature on frequency of color-blindness. There is ample evidence 
that seven to eight rather than three to four per cent of men are color- 
blind. Many readers will feel that a more frequent use of figures, 
diagrams, and tabular material to illustrate and to substantiate 
points made in the text would improve the book considerably. Fur- 
thermore, the failure to include materials on development of per- 
sonality and personality measurement leaves the text incomplete. 
Nevertheless, the book should be classed as a good text, one that will 
be welcomed by instructors whose viewpoint is similar to that of the 
writers. | Mies A. TINKER. 

University of Minnesota. 


Louis V. NEWKIRK AND Harry A. GREENE. Tests and Measuremenis 
in Industrial Education. New York: John Wiley and Sons, 
Inc., 1935, pp. X + 253. 


While this book covers the field of educational measurements in 4 
commendable manner, it scarcely lives up to its name. More descrip- 
tive than its assigned title would be some such title as Educational 
Measurements for Teachers of Industrial Subjects. The dearth of 
standardized tests, valid and reliable, in the field of industrial educa- 
tion would provide a problem for any author. Newkirk and Greene 
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solve this problem by presenting an elementary textbook in educa- 
tional measurements with illustrations, real and hypothetical, from 
industrial subjects. The book is not sufficiently explicit on trade 
tests, mechanical aptitude tests, and manipulative tests in general. 
Perhaps the reason for this condition lies in the fact that industrial 
arts and vocational industrial objectives are far from being agreed 
upon by teachers and supervisors in these fields. One gathers the 
impression from illustrations provided that experimenters in testing 
in industrial education should more thoroughly explore the possibilities 
of the measurement of abilities through the use of mechanical equip- 
ment and give less attention to paper and pencil techniques. 
Standard practices in the use of conventional measurement pro- 
cedures, with both standardized and teacher-made objective classroom 
tests, are adequately treated. The book is easy to read and clear 
in exposition. Despite the limited materials at their disposal, the 
authors have prepared a book that should stimulate further work 
in measurement in industrial education. Guen U. CLEETON. 
Carnegie Institute of Technology. 


L. F. SHarrer. The Psychology of Adjustment. Boston, Houghton 
Mifflin Co., 1936, pp. XXI + 600. 


During the last three decades there has been growing a semi- 
professional, semi-lay movement known as Mental Hygiene. The 
chief characteristic of this movement has been a vagueness and 
ambiguity regarding its aims and methods. At different times and 
places professional and lay associates have been concerned with 
feeble-mindedness, psychoses, personality aberrations, delinquency, 
crime, drug addiction, and a host of other problems superficially 
unrelated. Physicians, psychiatrists, social workers, psychologists, 
educators, and laymen have each concerned themselves with that 
aspect of the amorphorus field which happened to be of immediate 
interest. 

The common element in all of the various problems mothered 
by Mental Hygiene is human behavior. Thus it would appear 
that psychology, as the science of human behavior, should be of 
basic importance. This has not been true. Judging from the 
textbooks and journal literature there has been no real attempt to 
crystallize the psychology of mental hygiene. This lack of a clearly 
formulated psychology is the greatest weakness of this field. 
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Dr. Shaffer’s Psychology of Adjustment marks the beginnings 
of maturity for mental hygienr. For the first time mental hygiene 
workers have a systematic presentation of the psychological basis 
of their discipline. A simple recital of chapter titles gives a picture 
of the organization: Human conduct and scientific method, The 
origins of behavior, The modi ucation of behavior, Motivation, Adjust- 
ment, Varieties of adjustive behavior (five chapters), Personality 
(three chapters). The point-of-view of adjustment is never lost; 
it is the motif of the chapters dealing with general psychological 
problems, of those sections in which are discussed various types 
of adjustive behavior, and in the chapters dealing with practical 
techniques. 

Perhaps the most serious criticisms of the work are two. First, 
the psychologist may feel that too much space has been devoted to 
elementary topics. Perhaps this is not a valid criticism as the book 
is evidently intended—and without doubt would be most valuable 
reading—for physicians, teachers, social workers, and other non- 
psychological mental hygiene workers. The second criticism is 
more serious. ‘‘An objective approach to mental hygiene,” is the 
sub-title. Unfortunately Dr. Shaffer has limited the ‘‘objective 
approach ”’ to the conditioned response and other hackneyed concepts 
of behaviorism. It is questionable whether these alone constitute 
objectivity. Certainly Kantor’s interpretation of behavior as organis- 
mic interaction is equally objective, and in this reviewer’s experience 
is more applicable to clinical situations. 

Regardless of these criticisms this book is, and will probably be 
for some time, the most adequate textbook in mental hygiene. It 
should be read by every mental hygienist whether his interest is 
medical, social, educational, psychological, or only that of an intelligent 


layman. C. M. Lovuttit. 
Indiana University. 


JAIME CAsTIELLO. A Humane Psychology of Education. New York: 
Sheed and Ward, 1936, pp. xxiii + 254. 


In this essay on educational psychology to which Louis J. A. 
Mercier has contributed an introduction, the author severely indicts 
experimental psychology for its mechanistic trends. The viewpoint 
is that of scholastic philosophy and the small volume is essentially 
a critique rather than an exposition of educational psychology as 
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a whole. The principal point of attack is Thorndike’s theory of 
learning, which is rejected on the ground that it pertains only to 
the lower forms of learning and not to the higher activities that are 
the attributes of the human personality exclusively. 

Fragments of other topics of educational psychology receive 
mention. When the author briefly recognizes such problems as 
the measurement of human traits, he discloses unfamiliarity with 
the facts. Such conspicuous errors as those in the following excerpts 
can only weaken the philosophical argument that forms the essence 
of the essay: 


(Achievement tests.) Of course the results of such tests can be tabulated 
beautifully. If 50 per cent of the blanks have been filled in successfully, we 
have 50 per cent achievement. But their weakness lies in the fact that they 
control nothing but memory. Used as memory tests they are perfectly 
legitimate. P. 37. 

On the other hand, the much maligned essay-system of examination takes 
in the whole of the mind working in a real situation and controls not only the 
facts (one cannot write an essay without factual knowledge) but also man’s 
capacity for controlling his facts, judging them, codrdinating them and using 
them as a means for further investigation. In other words it controls the 
quantity and the quality of achievement. P. 38. 

The chief utility of mental tests is that they enable us to compare different 
national or racial groups which otherwise would not be comparable for lack 
of a common standard of measurement. P. 40. 


Such egregious misstatements hardly reveal a sufficient acquaint- 
ance with educational psychology to justify the condemnation of 
its tools. Some inconsistency is shown by a tendency to accept Spear- 
man’s definition of intelligence and the rejection of the methods by 
which such a definition was made possible. It-is significant that the 
author does not employ any of the scientific evidence that has accu- 
mulated to disprove theories of learning that are incompatible with 
both scientific fact and philosophical principle. 

It is unfortunate that this essay may be confused by some with 
scholastic educational psychology. Considered solely as a criticism 
of a particular theory, it is incomplete and its manifest limitations 
will detract from the respect that its fundamental principles should 
enjoy. T. G. Foran. 

The Catholic University of America. 
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